elvis (@elvissaravia): "Cool research paper from Google. This is what clever context engineering looks like. It proposes Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents …"

The app for independent voices

Cool research paper from Google.

This is what clever context engineering looks like.

It proposes Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning.

This work shows how to get better reasoning from LLMs by running a bunch of diverse agents (text-only, code, search, etc.) in parallel and letting them share notes across a few rounds. Instead of brute-forcing more samples, it mixes strategies, stops when confident, and ends up both more accurate and cheaper.

Mix different agents, not just more of one: They ran 15 different agent styles (CoT, code execution, web search, guided variants). Each agent sees both the question and other agents’ past answers, then tries again. This back-and-forth makes the group smarter than any single agent.

Stop early, save cost: More rounds don’t always help. Too much refinement can kill diversity. They use an LLM-judge to decide when to stop. That keeps accuracy high while cutting costs almost in half.

Better than existing methods: Compared with other tool-augmented scaling tricks, TUMIX consistently scores higher on tough reasoning benchmarks (HLE, GPQA-Diamond, AIME). For Gemini-2.5 Pro, it pushed HLE to 34.1%, which is a notable gain.

Diversity is the secret sauce: Combining text, code, and search agents beats repeatedly sampling the best single agent. More diverse tool use = more chances to land on the right reasoning path.

Auto-agent design: They even had the LLM generate new agent types and mixed those in, which boosted results further. The sweet spot was around 12–15 different agent styles in the mix.

arxiv.org/abs/2510.01279

Track trending AI papers here: nlp.elvissaravia.com

Oct 3

1:57 PM

The app for independent voices

Log in or sign up