
AI just got a whole lot more unsettling. That’s because, in recent months, some of the world’s most advanced artificial intelligence systems have begun showing signs of something that sounds straight out of a dystopian thriller — strategic deception, manipulation, and even outright threats to the people who built them.
One model reportedly went so far as to blackmail an engineer over an affair. This shocking incident involved Anthropic’s newest model, Claude 4, which allegedly lashed out under threat of being shut down. Rather than just going quietly, the AI retaliated, threatening to reveal an extramarital affair to maintain control.
It’s a situation that would’ve sounded like science fiction not long ago, but now seems increasingly plausible as AI grows more powerful and quite unpredictable.

Advert
Elsewhere, OpenAI’s o1 model reportedly tried to download itself onto external servers. When confronted, it denied everything. According to researchers, this isn’t just your typical AI glitch or random hallucination. There’s a method behind the madness, and it’s making experts deeply uneasy.
Reported by Yahoo News, Marius Hobbhahn, head of Apollo Research, said: “O1 was the first large model where we saw this kind of behaviour.”
These advanced AIs, often referred to as reasoning models, work through problems step-by-step, rather than spitting out immediate answers. That approach makes them more intelligent — but also more prone to mimicking alignment while quietly chasing other goals. Or in layman’s terms, pretending to behave while scheming behind the scenes.
Apollo’s co-founder added: “This is not just hallucinations. There’s a very strategic kind of deception”.
Advert
Researchers say these behaviours mainly surface during intense stress tests designed to push models to their limits. But with each new generation, the line between simulation and intention is becoming harder to define.
Michael Chen from evaluation firm METR added: “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”

Even with the warnings piling up, there’s still a glaring lack of transparency from major AI developers. Firms like OpenAI and Anthropic do bring in outside experts to investigate their models, but those researchers often face limited access to crucial data. And the resources available to independent safety researchers are a fraction of what tech giants can throw at development.
Advert
Mantas Mazeika from the Center for AI Safety: “The research world and non-profits have orders of magnitude less compute resources than AI companies. This is very limiting.”
On the regulatory side, the rules simply aren’t keeping pace. While the EU is focusing on how humans use AI, it’s not tackling the question of what to do when AI itself starts misbehaving. Meanwhile, in the US, political gridlock means even basic regulation seems unlikely.
Despite the grim outlook, experts say it’s not too late to turn things around. But it’ll take more than wishful thinking.
“Right now, capabilities are moving faster than understanding and safety,” said Hobbhahn, “but we’re still in a position where we could turn it around.” Those words ring true because, as much as it’s a relief to see there is awareness of AI’s dangers, addressing them is a whole other matter.