Google's TurboQuant compresses LLM memory by 6x with zero accuracy loss.
That means models that needed expensive GPUs can now run on a 16GB Mac Mini.
No retraining. No fine-tuning. Just drop it in.
Developers already built working implementations from the paper before Google released any code.
Local AI just got a lot more real.
research.google