GPT-5.5 Beats Claude Fable 5 on Agents' Last Exam AI Benchmark
Summary
OpenAI's GPT-5.5 has surpassed Anthropic's new Claude Fable 5 on a challenging new AI benchmark called Agents' Last Exam, or ALE. GPT-5.5 scored a 24.0% pass rate, slightly ahead of Fable 5's 22.0%. This test, developed by UC Berkeley, measures if AI can perform real, valuable professional tasks. What's interesting is that both models still fail most of the time, highlighting how much further AI needs to develop. The ALE benchmark is different because it evaluates AI on complex, multi-step workflows, like creating 3D models or analyzing neuroimages, instead of simple coding puzzles. It uses code-based evaluation for most tasks, avoiding issues seen in earlier benchmarks. The bottom line is that while AI is making strides, even the most advanced models are still a long way from consistently handling real-world professional work.
This is an AI-generated audio summary. Always check the original source for complete reporting.