Cameron R. Wolfe, Ph.D. 

@cwolferesearch
ML @ Netflix • Rice University PhD • I make AI understandable

Reasoning models like Grok-3 reasoning beta and DeepSeek-R1 are trained using reinforcement learning with verifiable rewards, but what exactly does this mean?

Verifiable tasks. One detail that we should immediately notice about reasoning models is that they are primarily used for and evaluated on problems that are verifiable in nature; e.…

Log in for more
Or create an account

The trajectory of research for open LLMs and open reasoning models has been shockingly similar, but there are still many open questions…

Phase One: Everything begins with the release of a powerful, open model. For general LLM research, this model was LLaMA, which enabled tons of downstream research (e.g., Alpaca, Vicuna, Koala, etc.). For…