The Journey of RL is a new series tracing reinforcement learning across one core question: how did machines learn what to optimize?
Twelve parts. From Thorndike's puzzle box to GRPO. From behaviorist psychology to the verifiable turn. From Klopf and Sutton-Barto to where the reward hypothesis began to come apart.