98.99% uptime on Claude API over 90 days sounds great.
That's about 21 hours of downtime per quarter. If your agent runs on a schedule, it runs during some of them.
Add rate limits, OAuth token expiry mid-cron, regional degradation, model deprecations, capability gaps, and the cost spikes that hit when a loop misbehaves at 2am, and "we have one provider" stops being a strategy.
The cheapest piece of resilience I have ever shipped is a $20 OpenRouter account. It is not a cost. It is insurance against the morning the primary has a bad hour.
The bigger surprise: the same $20 became an extension cord for capabilities I deliberately keep off my primary agent stack. Image generation. Long-context refactors. Cheap classification passes I don't want to spend Opus tokens on. None of them earn an extra SDK in the core loop. They live on the extension side instead.
Half insurance. Half extension. Full write-up of the 5-rung mechanism in this week's Digital Thoughts.