Lately I’ve been thinking about why so many software systems feel slower, more expensive, and more brittle than they should — even when compute keeps getting cheaper and faster.
The issue isn’t really CPUs or GPUs anymore. It’s memory.
Modern systems rely on holding more and more state close to compute. When that state fits in fast memory, everything works fine. When it doesn’t, systems either pull it from slower storage or throw it away and recompute it.
Both options are expensive.
At scale, this shows up as weird latency spikes, wasted compute, higher power usage, and infra costs that don’t make intuitive sense. Even a small percentage of recomputation can quietly burn thousands of dollars per GPU every month.
AI has made this problem more obvious, but it didn’t create it. Long-running services, real-time pipelines, personalization, coordination-heavy workflows — they all hit the same wall.
The core tradeoff keeps coming back to one question: do you hold state, or do you redo the work?
I wrote a short piece breaking this down from a systems perspective — not AI hype, just infrastructure reality.
Curious how others are seeing this show up in production systems lately.