Make money doing the work you believe in

The recording of my talk on rubric RL last week is now available online! You can find it here:

The full session starts with a from-scratch / hand-written explanation of key RL algorithms like PPO, DPO and GRPO from Professor Tom Yeh (AI by Hand). After covering RL fundamentals, I explained some more recent topics related to RLVR and how large-scale RL can be extended to non-verifiable domains with rubrics. I hope it is helpful!

Mar 4
at
5:19 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.