AI Codes 60,000 Lines: MirrorCode Benchmark Released

1h ago·0:00 listen·Source: Tech Times

Summary

AI models can now autonomously engineer more software than previously thought possible. That's according to the new MirrorCode benchmark from Epoch AI and AI safety organization METR. What's interesting is how much this pushes the boundaries. Claude Opus 4.7 reimplemented a configuration programming language with approximately 60,000 lines of code. This is the largest autonomous coding achievement documented in any public evaluation so far. In another test, Opus 4.7 reimplemented a bioinformatics toolkit of about 16,000 lines of Go code in 14 hours, costing $251. Epoch AI estimates a human engineer would take two to seventeen weeks for the same task without AI. The MirrorCode benchmark evaluates AI by giving it only a compiled program and its documentation. The AI gets no source code, no internet, and no human help. It must then write new source code that perfectly reproduces the original program's behavior. This matters because it shows a significant leap in AI's independent software engineering capabilities.

Read the full article on Tech Times

This is an AI-generated audio summary. Always check the original source for complete reporting.

Share
Keep Listening