Benjamin Marie (@bnjmnmarie): "I tested Unsloth’s UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well. In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error incre…"

Make money doing the work you believe in

I tested Unsloth’s UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well.

In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).

You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried).

The Kaitchup – AI on a Budget

Qwen3.5: Scaling Hybrid Attention to 397B Parameters

Feb 20

6:02 AM

Make money doing the work you believe in

Log in or sign up