Make money doing the work you believe in

Imagine if you had 6 options now. Do you do all 6 in parallel? That's the best, but the most wasteful. Even with tokens we do things like speculative decoding! So we have to find better ways of doing the mix.

Things get more complex when you don't know what H and L are. For instance, if you had GPT 5.5 and Opus 4.7, how do you choose which to route where? Or do you do both every time? That's when it gets interesting.

Three rules are available:

1. Always use H. Simple, but you overpay on tasks that L could have handled.

2. Always use L. Cheap, but you fail on tasks that need H.

3. Run both in parallel, take whichever works. Highest completion rate, but you pay for redundant work even when one agent alone would have sufficed.

-> 3 as described is so weird …

May 4
at
5:47 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.