Benjamin Marie (@bnjmnmarie): "List of quantized Gemma 4 31B I’m evaluating: Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB) cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB) RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB) nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB) RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB) → ye…"

Make money doing the work you believe in

List of quantized Gemma 4 31B I’m evaluating:

Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB)
cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB)
RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB)
nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB)
RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB)

→ yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.

Apr 6

2:39 PM

Make money doing the work you believe in

Log in or sign up