Claude Code with Opus 4.5 driving, OpenAI's Codex for code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right. On the dgx spark finetuning olmo 2 1b sft. Built by referencing the original repositories + TRL.