I compared NVFP4 to common 4-bit paths (AWQ, AutoRound, bitsandbytes) on an RTX 6000 Pro. Accuracy was broadly similar; in my runs, INT4 (AWQ/AutoRound) was slightly ahead of NVFP4/NVFP4A16 on some tasks.
NVFP4 models were larger than typical INT4 (around +7 GB for Llama 3.3), but throughput was the differentiator: with activation quantization, NVFP4 achieved about 2.35x the tokens/sec of INT4 on Blackwell.
Using NVFP4A16 (weights only) removed most of that speedup.Practical takeaway: if you’re on Blackwell and care primarily about inference speed, NVFP4 with activation quantization is a good default. If storage is tight or you want every last bit of accuracy, INT4 remains a solid option.