Make money doing the work you believe in

In last week’s state of the themes (images attached), we detailed the potential for flash to rival DRAM and/or be utilized in a way that makes the DRAM tax less severe.

We’ve gotten three announcements focused on flash from three major players in the past couple weeks. The core of the reason behind this focus by the major players is that AI workloads are creating new memory objects that are too large for HBM but too valuable for cold storage. It’s simply to expensive to use DRAM for everything if it’s possible to use NAND for some of it.

What we’ve seen so far:

Nvidia is using flash to store reusable KV cache via CMX. This targets long-context, multi-turn and agentic inference by extending GPU memory with a shared KV-cache tier, allowing flash to become context memory for long-running agents.

AMD is using flash to store some system memory until it is predicted to be hot enough to go to DRAM. They’re buying MEXT as a software layer to accomplish this.

Apple is using flash to store some model weights until needed. Apple stores inactive experts in flash on the device side (AFM 3 Core Advanced, published a week ago).

If you can get the system to know what data is likely to be needed next, flash goes from simple storage to cheaper memory.

Whether this is yet another jevons paradox situation that simply means the DRAM shortage remains as severe but we use it even more effectively, or something that lays the groundwork for DRAM margins to get hit as lower margin (and easier-to-produce, especially by China) flash becomes more capable is in the eye of the beholder.

But I certainly would not be shorting flash right now.

Jun 16
at
2:46 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.