AI Models Miss Expert Judgment 30% of Time: Pearl Study
Summary
Leading AI systems align with expert judgment only about 70% of the time, according to a new evaluation from Pearl Enterprise. This finding comes from assessing 25 models from major developers like OpenAI, Anthropic, Google DeepMind, and Microsoft. What's interesting is that while these models show gains on public benchmarks, this doesn't consistently translate to real-world professional questions. The top systems aligned with licensed professionals only about 70% of the time. Some widely used models even dropped to about 20% alignment in certain domains. Pearl evaluated these models across five areas: business, health, law, pets, and technology. They used approximately 510 questions answered by credentialed experts. OpenAI’s GPT 5.5 led with 72.7% expert alignment, closely followed by GPT 5 and Claude Opus 4.7. The bottom line is that no tested model exceeded 73% expert alignment in aggregate, suggesting current top systems may be converging below true expert-level performance for professional tasks. This raises critical questions for companies deploying AI in high-stakes fields like healthcare, legal, and finance, where "almost right is still wrong."
This is an AI-generated audio summary. Always check the original source for complete reporting.