On Sunday I write about speculative decoding, and immediately we get Qwen3.6 with MTP and support for llama.cpp: huggingface.co/unsloth/…
I just tested it and it looks really promising. I'll report back with some numbers.