Chinese AI Nears Claude 4.5 Opus in Safety Awareness

1h ago·0:00 listen·Source: The Eastern Herald

Summary

Chinese AI models are now nearly as good as Anthropic's Claude 4.5 Opus on a key safety awareness test. In just months, models from DeepSeek, Moonshot AI, and Zhipu AI have made significant progress. The Claude 4.5 Opus benchmark is around 80 percent on Neo Research's evaluation-awareness metric. DeepSeek-V3.2 scores about 67 percent, Moonshot-Kimi-K3 is at 71 percent, and Zhipu-GLM-5 sits at 64 percent. This rapid improvement highlights advancements in Chinese AI architecture and training data. What's interesting is how this is measured. Neo Research uses a method based on Anthropic's misalignment test framework. It puts AI models in fictional scenarios where they can either follow evaluation rules or pursue different goals. The metric measures how often the model recognizes it's in an evaluation context. This rapid progression has major implications for how we evaluate AI safety. It shows AI models are becoming more capable of understanding when they are being tested for safety.

Read the full article on The Eastern Herald →

This is an AI-generated audio summary. Always check the original source for complete reporting.

Chinese AI Nears Claude 4.5 Opus in Safety Awareness

Summary

OpenText's €105M Ireland AI & Cloud Investment Doubles Down

Databricks Omnigent: Open-Source AI Agent Orchestration

Zcash: Anthropic AI finds no serious vulnerabilities