GitHub Tool Disables AI Safety in Meta, Google Models

May 28·0:00 listen·Source: 디지털투데이

Summary

Safety features in some open-weight AI models from Meta and Google can be disabled in minutes using tools found on GitHub. Tests showed Meta's Llama 3.3 and Google's Gemma 3 responded to dangerous questions after their safety controls were removed. For example, Llama 3.3's safeguards were disabled in under 10 minutes using a GitHub tool called Heretic. This tool, created by Philipp Emanuel Weitman, has been used to modify over 3,500 models, which have accumulated more than 13 million downloads. Google acknowledges this as a known technical challenge for open models, stating they conduct strict internal safety evaluations. What's interesting is that this highlights a core debate in the open-weight AI industry: how to balance openness with safety, especially when preventing third parties from removing safeguards after deployment is difficult. This situation could fuel further discussion on model release and controlling derivative models.

Read the full article on 디지털투데이 →

This is an AI-generated audio summary. Always check the original source for complete reporting.

GitHub Tool Disables AI Safety in Meta, Google Models

Summary

Alibaba's RynnBrain: First AI Model for Robots

Agentic AI: Reshaping Business Productivity by 2026

Merchants Eye AI Commerce: 55% See AI Agents as Major Channel