Gated DeltaNet-2 (GDN-2) looks great!
We know GDN works in SOTA LLMs.
Qwen3.5-397B-A17B and Qwen3.6-27B use hybrid architectures with mostly Gated DeltaNet blocks, mixed with Gated Attention at a 3:1 ratio.
The main change in GDN-2:
> GDN uses one scalar gate for both erasing old memory and writing new content.
> GDN-2 separates the two.
> One channel-wise gate controls key-side erase. Another controls value-side write.
That should make memory updates more precise, while keeping the same general GDN idea.
github.com
Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention - NVlabs/GatedDeltaNet-2