Cameron R. Wolfe, Ph.D. (@cwolferesearch): "Interested in learning how to run RL at scale? Here are the best resources to read… Research on Scaling RL The Art of Scaling RL compute for LLMs: https://arxiv.org/abs/2510.13786 Scaling Behaviors of LLM RL Post-Training: https://arxiv.org/abs/2509.25300 Optimally S…"

Interested in learning how to run RL at scale? Here are the best resources to read…

Research on Scaling RL

The Art of Scaling RL compute for LLMs: arxiv.org/abs/2510.13786
Scaling Behaviors of LLM RL Post-Training: arxiv.org/abs/2509.25300
Optimally Scaling Sampling Compute for LLM RL: arxiv.org/abs/2603.12151
Scaling up RL: arxiv.org/abs/2507.12507
ProRL V2 - Prolonged Training Validates RL Scaling Laws: hijkzzz.notion.site/pro…
Polaris - A Recipe for Scaling RL with Reasoning Models: hkunlp.github.io/blog/2…

RL Frameworks

Hybrid Flow (outline of the verl framework): arxiv.org/abs/2409.19256
1. More up-to-date info can be found here: arxiv.org/abs/2601.18150
AReal - Large-Scale Async RL: arxiv.org/abs/2505.24298
PipelineRL - Fast On-Policy RL: arxiv.org/abs/2509.19128
AsyncFlow - Async Streaming RL: arxiv.org/abs/2507.01663

RL for Agents

DeepSWE - Open Coding Agent Trained w/ RL: together.ai/blog/deepswe
AutoForge - Environment Synthesis for Agentic RL: arxiv.org/abs/2512.22857
Agent-R1 - Training Agents w/ End-to-End RL: arxiv.org/abs/2511.14460
AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: arxiv.org/abs/2510.04206
The Landscape of Agentic RL: arxiv.org/abs/2509.02547
Training SWE Agents with RL: arxiv.org/abs/2508.03501

Case Studies & Tech Reports

Kimi tech reports:
1. Kimi K2 - Open Agentic Intelligence: arxiv.org/abs/2507.20534
2. Kimi End-to-end Agentic RL: moonshotai.github.io/Ki…
3. Kimi K1.5 - Scaling RL for LLMs: arxiv.org/abs/2501.12599
Composer series from Cursor:
1. Composer 2: arxiv.org/abs/2603.24477
2. Composer 2.5: cursor.com/blog/compose…
Olmo 3 (also has open code / data): arxiv.org/abs/2512.13961
MiniMax tech reports:
1. MiniMax-M2: arxiv.org/abs/2605.26494
2. MiniMax-M1: arxiv.org/abs/2506.13585
Nemotron 3 (NVIDIA): arxiv.org/abs/2512.20856

Jun 2

3:13 PM