GPT-5.5 Beats Claude Fable 5 on Agents' Last Exam AI Benchmark

Jun 12·0:00 listen·Source: OpenTools

Summary

OpenAI's GPT-5.5 has surpassed Anthropic's new Claude Fable 5 on a challenging new AI benchmark called Agents' Last Exam, or ALE. GPT-5.5 scored a 24.0% pass rate, slightly ahead of Fable 5's 22.0%. This test, developed by UC Berkeley, measures if AI can perform real, valuable professional tasks. What's interesting is that both models still fail most of the time, highlighting how much further AI needs to develop. The ALE benchmark is different because it evaluates AI on complex, multi-step workflows, like creating 3D models or analyzing neuroimages, instead of simple coding puzzles. It uses code-based evaluation for most tasks, avoiding issues seen in earlier benchmarks. The bottom line is that while AI is making strides, even the most advanced models are still a long way from consistently handling real-world professional work.

Read the full article on OpenTools

This is an AI-generated audio summary. Always check the original source for complete reporting.

Share
Keep Listening