ToxSec (@toxsec): "they built a “magic string” so developers could test refusal behavior. attackers (or anyone with write access) can drop it into: RAG documents Tool outputs / file reads Support tickets PR descriptions Shared chat history … and the model just stops. No output. …"

they built a “magic string” so developers could test refusal behavior.

attackers (or anyone with write access) can drop it into:

… and the model just stops. No output. No error. stop_reason: "refusal".

Worse? The refusal sticks in the context. Retry logic turns it into an infinite silent loop until a human manually purges the poisoned data.

It’s a documented feature turned one-line DoS.

zero payload tuning required.

the simple fix in the article.

One Magic String from Anthropic Silences Claude (RAG DoS Exposed)

Feb 24

4:01 PM