Make money doing the work you believe in

Just caught up with the recent GLM-5.2 release. The best open-weight model today.

Architecture-wise, it's build on the GLM-5 and GLM-5.1 architecture that I covered previously, which means it's reusing the Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) mechanisms from DeepSeek V3.2. (I wrote about it here: magazine.sebastianrasch…)

What's new is that they added an IndexShare mechanism. (That's a cross-layer reuse trick for DSA where instead of recomputing the sparse-attention top-k indexer in every layer, GLM-5.2 runs the full indexer only once every four layers and lets the following layers reuse those selected token indices. This keeps the same DSA idea but makes 1M-token inference much cheaper.)

Jun 18
at
2:16 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.