Make money doing the work you believe in

Gated DeltaNet-2 (GDN-2) looks great!

We know GDN works in SOTA LLMs.

Qwen3.5-397B-A17B and Qwen3.6-27B use hybrid architectures with mostly Gated DeltaNet blocks, mixed with Gated Attention at a 3:1 ratio.

The main change in GDN-2:

> GDN uses one scalar gate for both erasing old memory and writing new content.

> GDN-2 separates the two.

> One channel-wise gate controls key-side erase. Another controls value-side write.

That should make memory updates more precise, while keeping the same general GDN idea.

May 22
at
6:05 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.