Make money doing the work you believe in

I tested Unsloth’s UD Q4 and Q3 GGUF quantizations of Qwen3.5-397B-A17B and they both performed very well. 

In my runs, I didn’t observe a meaningful difference between the original weights and Q3 (less than 1 point of accuracy difference, so only a ~3.5% relative error increase).

You can cut on the order of ~500 GB of memory footprint while seeing little to no practical degradation (at least on the tasks I tried).

Qwen3.5: Scaling Hybrid Attention to 397B Parameters
Feb 20
at
6:02 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.