China's GLM-5.2 Tops OpenAI & Google in AI Benchmark

1h ago·0:00 listen·Source: OfficeChai

Summary

A Chinese model, GLM-5.2, has surpassed all OpenAI and Google models on a key benchmark for real-world tasks. This model, from Beijing-based Knowledge Atlas, placed third on GDPval-AA v2, scoring 1524 Elo. Only two Anthropic models, Claude Fable 5 and Claude Opus 4.8, scored higher. Here's the thing: Every OpenAI and Google model sits below GLM-5.2 on this benchmark. For instance, GPT-5.5 scores 1509, and Google's best, Gemini 3.5 Flash, scores 1357. What's interesting is that GDPval-AA measures multi-turn, agentic tasks designed to mirror actual paid knowledge work, not just isolated puzzles. GLM-5.2 averaged about 31 turns per task across nearly 2,000 matches. The model also leads open models on the Agentic Index and performed well on AA-Briefcase, another agentic knowledge work benchmark. This shows it can handle complex professional briefs. The bottom line is that a Chinese model is now outperforming top US proprietary models in real-world application benchmarks.

Read the full article on OfficeChai →

This is an AI-generated audio summary. Always check the original source for complete reporting.

China's GLM-5.2 Tops OpenAI & Google in AI Benchmark

Summary

Sakana AI's Fugu Beats Claude 3 in Coding Benchmarks

Canon Business Services: New AI Security Advisory Launched

Shopee on ChatGPT: Malaysia Users Get AI Shopping Help