Benjamin Marie (@bnjmnmarie): "DFlash can be faster than MTP. But there is no universal winner. It depends on the model, the hardware, the task, the workload, and many other details. I benchmarked MTP vs DFlash with vLLM and llama.cpp across math, coding, and chat workloads. For Qwen3.6 27B, DFlash deliver…"

DFlash can be faster than MTP.

But there is no universal winner. It depends on the model, the hardware, the task, the workload, and many other details.

I benchmarked MTP vs DFlash with vLLM and llama.cpp across math, coding, and chat workloads.

For Qwen3.6 27B, DFlash delivered up to a 4x speedup. MTP was not far behind, once tuned, but DFlash reached the highest peak performance.

For Qwen3.6 35B A3B, MTP often performs better.

All the results here:

DFlash vs MTP: Qwen3.6 Speculative Decoding Benchmarks with vLLM and llama.cpp

Jun 3

12:05 AM