NVIDIA Gated DeltaNet-2: Decouples Erase & Write in AI
Summary
NVIDIA has released Gated DeltaNet-2, a new linear attention layer. This model decouples the active memory edit into two channel-wise gates. It aims to improve how AI models edit compressed memory without scrambling existing information. Gated DeltaNet-2 was trained with 1.3 billion parameters on 100 billion FineWeb-Edu tokens. It outperforms several previous models, including Mamba-2, Gated DeltaNet, KDA, and Mamba-3, across various benchmarks. The model addresses a problem in delta-rule models where a single scalar gate controlled both forgetting old content and committing new content. Gated DeltaNet-2 introduces a channel-wise erase gate and a channel-wise write gate, separating these two decisions. This allows for more precise control over memory updates. This development could lead to more efficient and effective AI memory systems.
This is an AI-generated audio summary. Always check the original source for complete reporting.