Cameron R. Wolfe, Ph.D. (@cwolferesearch): "I’m publishing a long-form overview of using rubrics for RL tomorrow. Here are all of the papers that it will cover. Am I missing anything? [1] Gunjal, Anisha, et al. "Rubrics as rewards: Reinforcement learning beyond verifiable domains." arXiv preprint arXiv:2507.17746 (2025).…"

Make money doing the work you believe in

I’m publishing a long-form overview of using rubrics for RL tomorrow. Here are all of the papers that it will cover. Am I missing anything?

[1] Gunjal, Anisha, et al. "Rubrics as rewards: Reinforcement learning beyond verifiable domains." arXiv preprint arXiv:2507.17746 (2025).

[2] Huang, Zenan, et al. "Reinforcement learning with rubric anchors." arXiv preprint arXiv:2508.12790 (2025).

[3] Liu, Tianci, et al. "Openrubrics: Towards scalable synthetic rubric generation for reward modeling and llm alignment." arXiv preprint arXiv:2510.07743 (2025).

[4] Shao, Rulin, et al. "Dr tulu: Reinforcement learning with evolving rubrics for deep research." arXiv preprint arXiv:2511.19399 (2025).

[5] Xu, Ran, et al. "Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training." arXiv preprint arXiv:2602.01511 (2026).

[6] Xu, Wenyuan, et al. "A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization." arXiv preprint arXiv:2504.04950 (2025).

[7] Zheng, Lianmin, et al. "Judging llm-as-a-judge with mt-bench and chatbot arena." Advances in neural information processing systems 36 (2023): 46595-46623.

[8] Viswanathan, Vijay, et al. "Checklists are better than reward models for aligning language models." arXiv preprint arXiv:2507.18624 (2025).

[9] Mu, Tong, et al. "Rule based rewards for language model safety." Advances in Neural Information Processing Systems 37 (2024): 108877-108901.

[10] Gupta, Taneesh, et al. "CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling." Findings of the Association for Computational Linguistics: ACL 2025. 2025.

[11] Wu, Mian, et al. "Rlac: Reinforcement learning with adversarial critic for free-form generation tasks." arXiv preprint arXiv:2511.01758 (2025).

[12] Xie, Lipeng, et al. "Auto-rubric: Learning to extract generalizable criteria for reward modeling." arXiv preprint arXiv:2510.17314 (2025).

[13] Bai, Yuntao, et al. "Constitutional ai: Harmlessness from ai feedback." arXiv preprint arXiv:2212.08073 (2022).

[14] Guan, Melody Y., et al. "Deliberative alignment: Reasoning enables safer language models." arXiv preprint arXiv:2412.16339 (2024).

[15] Liu, Yang, et al. "G-eval: NLG evaluation using gpt-4 with better human alignment." arXiv preprint arXiv:2303.16634 (2023).

[16] Arora, Rahul K., et al. "Healthbench: Evaluating large language models towards improved human health." arXiv preprint arXiv:2505.08775 (2025).

[17] Deshpande, Kaustubh, et al. "Multichallenge: A realistic multi-turn conversation evaluation benchmark challenging to frontier llms." Findings of the Association for Computational Linguistics: ACL 2025. 2025.

Feb 15

7:55 PM

Make money doing the work you believe in

Log in or sign up