Make money doing the work you believe in

Some more really good papers on rubric rewards that I've been reading:

TL;DR: Rubric rewards are really cool. There is a lot of great recent progress that surpassed my expectations. There's also a lot more to be done, and making progress on truly subjective tasks seems to be noticeably more difficult.

My favorite paper so far is the first in this list, which proposes an alternating RL framework for jointly training a rubric generator and rubric-based reward model. There still is a lot to figure out w.r.t. making rubrics work well, but this paper shows a really clear benefit from rubric-based RL and has an interesting setup to make joint training (of the rubric generator and generative reward model) more stable.

There are still many areas for improvement for rubrics. For example, it seems rubrics still work best for constraints that are more objective, whereas very open-ended tasks (e.g., properly-styled creative writing) are still going to be quite tough. The benefit of rubrics is not uniform across domains, and it's not immediately clear for which domains rubrics will work best; e.g., instruction following tends to benefit a lot from rubrics, the benefit is less clear for things like science / medicine.

Interestingly, a lot of papers tackling very open-ended tasks with rubrics are also formulating evaluation as a pairwise problem. Given two completions, they ask the rubric generator to produce a rubric that will properly distinguish / rank the chosen and rejected completion in the pair. This probably makes very subjective evaluations easier, but the application to online RL is also less straightforward. We can't just compute the reward for a completion, we have to somehow create a pairwise comprison to compute the reward.

Feb 11
at
4:04 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.