Nathan Lambert (@natolambert): "RLHF Book status update: lot's of great changes. Over the past month I've been doing a top to bottom update to the RLHF book. All of these changes are reflected on the website rlhfbook.com, and will soon be translated to the Manning early access version (MEAP), and then more im…"

Make money doing the work you believe in

RLHF Book status update: lot's of great changes.

Over the past month I've been doing a top to bottom update to the RLHF book. All of these changes are reflected on the website rlhfbook.com, and will soon be translated to the Manning early access version (MEAP), and then more improvements for the physical copy.

Overall, this took the PDF from ~150 to ~200 pages, the book is much more well rounded now.

Some of the larger changes:

Updates to the RL chapter to add more algorithms like GSPO, CISPO, etc.
Updated the big table of reasoning model tech reports (full list below). Added a section on Rubrics for RLVR.
Updated the text in many chapters to better reflect best practices of today.
Many clarity fixes throughout, adding better transitions, introductions, etc.
More consistent notation throughout the book.

I strongly recommend taking a look again if you only looked in the first half of 2025. There are also many surprising details, such as fixing this attached RLHF system diagram you may recognize from my first HuggingFace RLHF blog post in December of 2022, it had a bunch of minor errors.

Next step I'm going to be focusing on making the physical Manning book great. The content will flow more smoothly than the web version (i'm trying to not change the links), such as linking the constitutional AI and synthetic data chapters. Overall this should make it read better from front to back. Also, all the diagrams and content will be designed to have a much more elegant presentation.

Thanks for reading and feedback!

Reasoning model reports I recommend reading:

2025-01-22 - DeepSeek R1 - arxiv.org/abs/2501.12948

2025-01-22 - Kimi 1.5 - arxiv.org/abs/2501.12599

2025-03-31 - Open-Reasoner-Zero - arxiv.org/abs/2503.24290

2025-04-10 - Seed-Thinking 1.5 - arxiv.org/abs/2504.13914

2025-04-30 - Phi-4 Reasoning - arxiv.org/abs/2504.21318

2025-05-02 - Llama-Nemotron - arxiv.org/abs/2505.00949

2025-05-12 - INTELLECT-2 - arxiv.org/abs/2505.07291

2025-05-12 - Xiaomi MiMo - arxiv.org/abs/2505.07608

2025-05-14 - Qwen 3 - arxiv.org/abs/2505.09388

2025-05-21 - Hunyuan-TurboS - arxiv.org/abs/2505.15431