Tumithak of the Corridors (@tumithak): "There's a tool on GitHub called Heretic. You point it at any open-weight language model and it removes the safety alignment. Permanently. Takes about 45 minutes on a consumer GPU. This isn't a jailbreak. That's the part worth paying attention to. Jailbreaks were social enginee…"

The app for independent voices

There's a tool on GitHub called Heretic. You point it at any open-weight language model and it removes the safety alignment. Permanently. Takes about 45 minutes on a consumer GPU.

This isn't a jailbreak. That's the part worth paying attention to.

Jailbreaks were social engineering. You'd write a clever prompt, trick the model into playing a character, and it'd answer your question until the next update patched the trick. Cat-and-mouse. Someone on Reddit would have a new one by Thursday.

Heretic does something else entirely. Turns out safety training creates a single identifiable direction in the model's internal geometry. One vector. You compute it by comparing how the model processes prompts it'd refuse versus prompts it wouldn't, then project that direction out of the weight matrices. The model keeps its capabilities. It just loses the ability to say no.

A 2024 paper found the direction. By late 2025, someone automated the removal with a standard optimization library. One pip install. One command. Eighteen months from research paper to one-click tool. And the automated version actually produces less capability damage than hand-tuned versions made by human experts.

Here's what that means for the safety labs. Their entire regulatory pitch requires alignment to be deep, robust, something only well-resourced organizations can get right. That's the argument for licensing regimes that happen to protect their market position.

A solo developer with an Optuna loop just proved it's a thin behavioral layer deletable with a dot product. Which means regulatory capture around alignment is symbolic theater. It regulates distribution, not capability. You can lock down who's allowed to train models. You can't lock down linear algebra.

The kids always find a way. They always have. The only difference now is it looks like math instead of a proxy server. And the tools get more elegant every generation.

Feb 16

1:10 AM

The app for independent voices

Log in or sign up