Nathan Lambert (@natolambert): "Claude Code with Opus 4.5 driving, OpenAI's Codex for code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right. On the dgx spark finetuning olmo 2 1b sft. Built by referencing the ori…"

Make money doing the work you believe in

Claude Code with Opus 4.5 driving, OpenAI's Codex for code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right. On the dgx spark finetuning olmo 2 1b sft. Built by referencing the original repositories + TRL.

We're living in the future.

github.com

Add direct alignment algorithms (DPO, IPO, SimPO, ORPO, KTO) by natolambert · Pull Request #226 · natolambert/rlhf-book

Summary Implements educational direct alignment algorithms for Chapter 12 6 algorithms: DPO, cDPO, IPO, SimPO, ORPO, KTO Default model: allenai/OLMo-2-0425-1B-SFT Default dataset: argilla/ultrafee...

Feb 1

3:41 PM

Make money doing the work you believe in

Log in or sign up