Devansh (@chocolatemilkcultleader): "Instead of storing a perfect record of every token you've ever seen, an SSM compresses the entire history of the conversation into a single, fixed-size mathematical box (a state vector). When a new token arrives, the model updates the state vector. Pro: O(1) memory during gener…"

Make money doing the work you believe in

Instead of storing a perfect record of every token you've ever seen, an SSM compresses the entire history of the conversation into a single, fixed-size mathematical box (a state vector). When a new token arrives, the model updates the state vector.

Pro: O(1) memory during generation. Theoretically the perfect edge architecture.

What it costs you: Precision. Because the state is a fixed size, it is a lossy compressor. Gets especially messy during quantization.

Mar 20

5:23 PM

Make money doing the work you believe in

Log in or sign up