Sebastian Raschka, PhD (@rasbt): "Just caught up with the recent GLM-5.2 release. The best open-weight model today. Architecture-wise, it's build on the GLM-5 and GLM-5.1 architecture that I covered previously, which means it's reusing the Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) me…"

Make money doing the work you believe in

Just caught up with the recent GLM-5.2 release. The best open-weight model today.

Architecture-wise, it's build on the GLM-5 and GLM-5.1 architecture that I covered previously, which means it's reusing the Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) mechanisms from DeepSeek V3.2. (I wrote about it here: magazine.sebastianrasch…)

What's new is that they added an IndexShare mechanism. (That's a cross-layer reuse trick for DSA where instead of recomputing the sparse-attention top-k indexer in every layer, GLM-5.2 runs the full indexer only once every four layers and lets the following layers reuse those selected token indices. This keeps the same DSA idea but makes 1M-token inference much cheaper.)

Ahead of AI

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

Jun 18

2:16 PM

Make money doing the work you believe in

Log in or sign up