List of quantized Gemma 4 31B I’m evaluating:
Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB)
cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB)
RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB)
nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB)
RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB)
→ yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.