Make money doing the work you believe in

List of quantized Gemma 4 31B I’m evaluating:

  • Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB)

  • cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB)

  • RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB)

  • nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB)

  • RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB)

→ yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.

Apr 6
at
2:39 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.