Make money doing the work you believe in

Claude Code with Opus 4.5 driving, OpenAI's Codex for code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right. On the dgx spark finetuning olmo 2 1b sft. Built by referencing the original repositories + TRL.

We're living in the future.

Feb 1
at
3:41 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.