NVIDIA Gated DeltaNet-2: Decouples Erase & Write in AI

2h ago·0:00 listen·Source: MarkTechPost

Summary

NVIDIA has released Gated DeltaNet-2, a new linear attention layer. This model decouples the active memory edit into two channel-wise gates. It aims to improve how AI models edit compressed memory without scrambling existing information. Gated DeltaNet-2 was trained with 1.3 billion parameters on 100 billion FineWeb-Edu tokens. It outperforms several previous models, including Mamba-2, Gated DeltaNet, KDA, and Mamba-3, across various benchmarks. The model addresses a problem in delta-rule models where a single scalar gate controlled both forgetting old content and committing new content. Gated DeltaNet-2 introduces a channel-wise erase gate and a channel-wise write gate, separating these two decisions. This allows for more precise control over memory updates. This development could lead to more efficient and effective AI memory systems.

Read the full article on MarkTechPost

This is an AI-generated audio summary. Always check the original source for complete reporting.

Share
Keep Listening