I quantized LFM2.5 into several 4-bit and 8-bit variants for fast inference in vLLM. All models were tested on vLLM 0.13 with an RTX 4090.
FP8: Great speed with minimal accuracy loss if you’re on a recent Ada / Hopper / Blackwell GPU.
NVFP4: If you have a Blackwell GPU, this should be the fastest option. Expect a more noticeable accuracy drop due to 4-bit activations (full evals soon).