The token-budget question is basically an ROI question. And calculating AI ROI is a problem because a) everyone gets a "use whatever you can" directive, and b) everyone ends up using Opus 4.7 or GPT 5.5 to do things like GitHub updates.
Both are because no agent knows their ability, and no agent therefore can be a true agent. They're Homo Agenticus, and our normal tricks don't work. You can't just get another model to supervise [1]. You can't just think the models themselves are aligned so things will be ok [2]. You can't rely on the models taking initative and thinking beyond the scope of the task [3].
That's what MarketBench [4] was meant to help answer, to be the first genuine test for how and if a model can actually help participate in a market, for which it needs to "know" itself! Andrey Fradkin and I have been trying to figure out better ways to solve this problem by teaching the models how to know its own limitations and abilities, and use that to help use them better!!