AI Models Miss Expert Judgment 30% of Time: Pearl Study

May 13·0:00 listen·Source: The AI Journal

Summary

Leading AI systems align with expert judgment only about 70% of the time, according to a new evaluation from Pearl Enterprise. This finding comes from assessing 25 models from major developers like OpenAI, Anthropic, Google DeepMind, and Microsoft. What's interesting is that while these models show gains on public benchmarks, this doesn't consistently translate to real-world professional questions. The top systems aligned with licensed professionals only about 70% of the time. Some widely used models even dropped to about 20% alignment in certain domains. Pearl evaluated these models across five areas: business, health, law, pets, and technology. They used approximately 510 questions answered by credentialed experts. OpenAI’s GPT 5.5 led with 72.7% expert alignment, closely followed by GPT 5 and Claude Opus 4.7. The bottom line is that no tested model exceeded 73% expert alignment in aggregate, suggesting current top systems may be converging below true expert-level performance for professional tasks. This raises critical questions for companies deploying AI in high-stakes fields like healthcare, legal, and finance, where "almost right is still wrong."

Read the full article on The AI Journal →

This is an AI-generated audio summary. Always check the original source for complete reporting.

AI Models Miss Expert Judgment 30% of Time: Pearl Study

Summary

Suprema: ISO/IEC 42001 Certified for AI Governance

Bunkerhill Health Raises $55M for AI in Healthcare

AI Under Pressure: Scams, Security, Sustainability