Cameron R. Wolfe, Ph.D. (@cwolferesearch): "The recording of my talk on rubric RL last week is now available online! You can find it here: https://www.byhand.ai/p/recording-ppo-dpo-grpo-rubrics The full session starts with a from-scratch / hand-written explanation of key RL algorithms like PPO, DPO and GRPO from Professo…"

Make money doing the work you believe in

The recording of my talk on rubric RL last week is now available online! You can find it here:

The full session starts with a from-scratch / hand-written explanation of key RL algorithms like PPO, DPO and GRPO from Professor Tom Yeh (AI by Hand). After covering RL fundamentals, I explained some more recent topics related to RLVR and how large-scale RL can be extended to non-verifiable domains with rubrics. I hope it is helpful!

AI by Hand ✍️

PPO → DPO → GRPO→ Rubrics

Mar 4

5:19 PM

Make money doing the work you believe in

Log in or sign up