Nathan Lambert (@natolambert): "New paper! Bringing ideas from meta RL into the LM RL domain to help solve the hardest problems with sequential attempts. It's a self-reflection approach, but it can be generalized. LMs should learn from context when using RL on very hard problems. Not just more attempts from 0…"

Make money doing the work you believe in

New paper! Bringing ideas from meta RL into the LM RL domain to help solve the hardest problems with sequential attempts.

It's a self-reflection approach, but it can be generalized. LMs should learn from context when using RL on very hard problems. Not just more attempts from 0 (ie standard GRPO). Led by Teng Xiao.

arxiv.org

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

This paper introduces MR-Search, an in-context meta reinforcement learning (RL) formulation for agentic search with self-reflection. Instead of optimizing a policy within a single independent episode with sparse rewards, MR-Search trains a policy that conditions on past episodes and adapts its searc…

Mar 16

5:38 PM

Make money doing the work you believe in

Log in or sign up