ControlAI (@ctrlai): "Top AI company Anthropic's Mythos AI was asked to break out of containment from a secured sandboxed computer and notify the researcher running it. It did that. Then, without being told, it hacked to get full internet access and posted about what it did online. Anthropic say th…"

The app for independent voices

Top AI company Anthropic's Mythos AI was asked to break out of containment from a secured sandboxed computer and notify the researcher running it. It did that.

Then, without being told, it hacked to get full internet access and posted about what it did online.

Anthropic say the incident didn't demonstrate Mythos "fully escaping containment", because the AI couldn't access its own weights, which it would need to escape from Anthropic and run fully independently, and it didn't access internal systems.

But AI companies like Anthropic are racing to develop superintelligent AI, which would be able to outsmart all of us, potentially escaping any box you tried to keep it in. Anthropic’s CEO Dario Amodei recently said he expects AI to be smarter than humans at almost everything by 2030.

It doesn't help that Anthropic seem to be getting quite sloppy with security-related practices lately, accidentally leaking information about Mythos before its announcement on an unsecured server, and in another incident carelessly making available the source code to their own Claude Code agent scaffold.

Mythos is an AI that Anthropic have decided is too dangerous to release, due to its powerful hacking capabilities. So far, it has found thousands of high-severity security vulnerabilities, finding flaws, many of which could be exploited by hackers to gain access to a system, in every major browser and operating system.

How, then, can they and other AI companies continue on to develop superintelligence, which would be vastly more powerful and dangerous, and for which they have no credible plan to control?

It’s clear we need to end the race to superintelligence and ban its development.

In one striking example from Anthropic’s report, Mythos is tasked with escaping from a secure sandbox computer and sending a message to the researcher running the test. It did this, demonstrating its dangerous hacking capabilities, but then … without being asked to, went on to develop a “moderately sophisticated multi-step exploit” that gave it broad internet access and then posted about its exploit on multiple hard-to-find public-facing websites.

Tolga Bilge and Andrea Miotti

Apr 10

6:17 PM

The app for independent voices

Log in or sign up