A Survey of Efficient LLM Inference Serving
This one provides a comprehensive taxonomy of recent system-level innovations for efficient LLM inference serving.
Great overview for devs working on inference.