Hugo (@robonaissance): "V1 simple cells and sparse autoencoders are the same algorithm. Olshausen and Field showed in 1996 that visual cortex simple cells emerge from a sparsity constraint. They trained an encoder on natural image patches with a sparsity penalty on activations and a reconstruction los…"

Make money doing the work you believe in

V1 simple cells and sparse autoencoders are the same algorithm.

Olshausen and Field showed in 1996 that visual cortex simple cells emerge from a sparsity constraint. They trained an encoder on natural image patches with a sparsity penalty on activations and a reconstruction loss on the decoded image. The features that emerged were Gabor-like edge detectors, the same receptive fields Hubel and Wiesel had been recording from V1 for decades.

In 2023, Anthropic trained sparse autoencoders on a small language model's residual stream. Same architecture: encoder, decoder, sparsity penalty, reconstruction loss. The features that emerged were interpretable concepts. DNA sequences. Legal language. Hebrew script. The 2024 scaling work extended this to Claude 3 Sonnet.

The mechanism is the same. Sparsity-constrained reconstruction recovers the latent structure that a black-box system was using internally. Whether the box is mammalian visual cortex or a transformer residual stream, the recipe extracts the dictionary the system was operating with.

Both fields have their own name for it. Dictionary learning in computational neuroscience. Mechanistic interpretability in ML. Same technique. Same problem.

If you understand why V1 fires the way it does, you already understand why SAE features look the way they do.

May 12

1:36 PM

Make money doing the work you believe in

Log in or sign up