Alberto Romero (@thealgorithmicbridge): "Very likely. Here's a paper from Anthropic on this: https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes categor…"

The app for independent voices

Apr 23, 2024

Very likely. Here's a paper from Anthropic on this: anthropic.com/news/slee…. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes category where cybersecurity is a real concern any time soon (sleeper agents is just one of many different problems, like jailbreaking, prompt injection, and adversarial attacks, feel free to look those up as well).

Apr 23, 2024

5:07 PM

The app for independent voices

Log in or sign up