Reasoning models like Grok-3 reasoning beta and DeepSeek-R1 are trained using reinforcement learning with verifiable rewards, but what exactly does this mean?
Verifiable tasks. One detail that we should immediately notice about reasoning models is that they are primarily used for and evaluated on problems that are verifiable in nature; e.…
The trajectory of research for open LLMs and open reasoning models has been shockingly similar, but there are still many open questions…
Phase One: Everything begins with the release of a powerful, open model. For general LLM research, this model was LLaMA, which enabled tons of downstream research (e.g., Alpaca, Vicuna, Koala, etc.). For…