Benjamin Marie (@bnjmnmarie): "Gated DeltaNet-2 (GDN-2) looks great! We know GDN works in SOTA LLMs. Qwen3.5-397B-A17B and Qwen3.6-27B use hybrid architectures with mostly Gated DeltaNet blocks, mixed with Gated Attention at a 3:1 ratio. The main change in GDN-2:

Gated DeltaNet-2 (GDN-2) looks great!

We know GDN works in SOTA LLMs.

Qwen3.5-397B-A17B and Qwen3.6-27B use hybrid architectures with mostly Gated DeltaNet blocks, mixed with Gated Attention at a 3:1 ratio.

The main change in GDN-2:

> GDN uses one scalar gate for both erasing old memory and writing new content.

> GDN-2 separates the two.

> One channel-wise gate controls key-side erase. Another controls value-side write.

That should make memory updates more precise, while keeping the same general GDN idea.

github.com

Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention - NVlabs/GatedDeltaNet-2

May 22

6:05 PM