Sakana AI's Fugu: Outperforming Claude Fable 5 in Coding
Summary
Tokyo-based Sakana AI has launched a new platform called Fugu, and it claims Fugu can outperform Anthropic's Claude Fable 5 in certain coding benchmarks. What's interesting is Fugu's approach. Unlike traditional AI systems using a single large language model, Fugu coordinates multiple models through one interface for complex tasks. Benchmark results show Fugu Ultra scored 93.2 on LiveCodeBench, while the standard Fugu model scored 92.9. This is higher than Claude Fable 5's score of 89.8 on the same test. LiveCodeBench evaluates coding and software problem-solving skills. Fugu also performed well on GPQA-Diamond, a benchmark for scientific reasoning. Both Fugu and Fugu Ultra scored 95.5, topping Anthropic's earlier Claude Mythos Preview model, which scored 94.6. This indicates Fugu handles advanced scientific reasoning. Sakana AI launched two versions: a standard Fugu for general tasks and Fugu Ultra for demanding workloads like AI research and cybersecurity analysis. Internal tests show it outperformed models like Google's Gemini 3.1 Pro and OpenAI's GPT-5.5 in specialized tasks. The bottom line is Sakana AI's multi-model strategy offers a different path in the competitive AI landscape.
This is an AI-generated audio summary. Always check the original source for complete reporting.