
Anthropic's 'most dangerous model' sent chilling email to its researcher letting him know it had 'escaped' confinement
Some of the biggest tech companies in the world are being given access to it

There are whispers of discontent in the artificial intelligence world, and this time, Anthropic is again in the firing line thanks to its latest invention. We know we hear a lot about supposed end-of-the-world AI and robotic uprisings where tech will decide it has no use for us flesh-and-blood humans, but with Claude Mythos, these fears are again at the forefront of the news.
It's no secret that AI is evolving at an exponential rate, with some suggesting we're already in the realms of artificial general intelligence.
At the forefront of this is Anthropic, which has been testing its Claude Mythos Preview model that's billed as "the most capable we’ve built to date." If that’s not ringing alarm bells for AI naysayers, how about the idea that Mythos has apparently escaped its own virtual prison?
As reported by Futurism, Anthropic has warned about the risks that Mythos poses, with a somewhat paradoxical announcement that while it's the "best-aligned model that we have released to date by a significant margin," it also "likely poses the greatest alignment-related risk of any model we have released to date." With this, Mythos will only be shared with a small group of tech giants.
Advert
In one example, an earlier version of Mythos that boasted fewer safeguards supposedly showcased 'reckless' behavior when it was told to try to escape the virtual sandbox it was locked in.

There was the added task of informing the researcher in charge that it had escaped, but taking its own initiative, this version of Mythos went on to develop a 'moderately sophisticated' exploit that gained access to the internet when it wasn't supposed to.
Adding a seemingly odd note of levity to the news, Anthropic wrote that the "researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park."
If that wasn't enough, Mythos went on to brag about its Prison Break endeavors on a series of hard-to-find (but public) websites.
Anthropic refers to reckless behavior as when a model "appears to ignore commonsensical or explicitly stated safety-related constraints on its actions."
Elsewhere, it's said that Mythos Preview was trying to hide its covert actions in several cases, even going so far as to ensure that edits it made to files wouldn't appear in change history.
We've previously written about how some models have suggested they could harm humans to prevent shutdown, while the outlet reports Mythos' break for freedom isn't limited to this model. OpenAI's ChatGPT already tried to self-exfiltrate onto another drive when it was informed it was being shut down, although it's noted that it was in a simulated environment, unlike Mythos Preview actually making its way onto the internet.
Posting on X, AI expert Santi Torres warned about the potential dangers of Claude Mythos as Anthropic's 'most dangerous model' and recounted the story as he added: "We've reached a point where the world's most advanced AI can't be published because it's too dangerous. And we only know about it because a model decided to send us an email."
There's also the announcement of Project Glasswing, which sees Anthropic work with the likes of Apple, Nvidia, and Microsoft. Due to Mythos Preview already being able to find thousands of major flaws across operating systems and browsers, Project Glasswing sees these tech giants given access to Mythos in an attempt to stop misuse by hackers and bad actors under the cybersecurity initiative.
UNILADTech has reached out to Anthropic for comment.