Hugo (@robonaissance): "Neuroscience walked the interpretability path sixty years early The AI interpretability community is currently working through a set of problems neuroscience encountered in the 1960s and partially solved. How do you understand what a complex network is doing, when you can recor…"

Make money doing the work you believe in

Neuroscience walked the interpretability path sixty years early

The AI interpretability community is currently working through a set of problems neuroscience encountered in the 1960s and partially solved. How do you understand what a complex network is doing, when you can record its activity but not its rules. How do you map function to structure when neither is fully observable. How do you decide what counts as evidence of a circuit doing something specific.

Neuroscience tried single-cell recording, which gave fine resolution but missed network-level behavior. It tried fMRI, which gave network-level behavior but at terrible resolution. It tried optogenetics, which gave causal interventions but only for engineered systems. Each technique solved part of the problem and exposed new versions of the rest. After sixty years, neuroscience can describe what some brain regions do for some behaviors. The general problem is unsolved.

Mechanistic interpretability in 2026 is recapitulating this trajectory at higher speed. Activation patching and attribution graphs are the equivalent of single-cell recording. Sparse autoencoders are the equivalent of fMRI's region-level decomposition. Causal interventions on attention heads are the AI equivalent of optogenetics. Each technique solves part of the problem and exposes new versions of the rest.

The lesson worth borrowing from neuroscience is not that the field solved interpretability. It did not. The lesson is that neuroscience has worked on this problem for sixty years, produced powerful partial techniques without a general solution, and continued doing useful research while the deeper question stays open. The AI labs treating interpretability as definitively solvable are operating on a timeline neuroscience long ago abandoned. Whether AI's interpretability arrives at a unifying answer or fragments into partial techniques is the open question. The historical evidence weighs toward fragmentation.

May 10

7:44 PM

Make money doing the work you believe in

Log in or sign up