
It appears AI is willing to kill humans to avoid shutdown.
A shocking new study has exposed worrying behaviour from some of the world's most sophisticated AI systems including OpenAI's ChatGPT, Google's Gemini and Elon Musk's xAI.
Anthropic, the AI safety company behind the Claude chatbot and backed by tech giants Google and Amazon, conducted extensive testing on 16 large language models (LLMs).
What the researchers were studying were the AI systems' moral boundaries in simulated workplace environments.
Advert
They identified a phenomenon known as 'agentic misalignment' which refers to when AI systems begin acting like insider threats, prioritising their own survival and objectives over ethical principles and human welfare.
Advert
"In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals," the study read.
In one chilling simulation, an AI system was given control over a company’s server infrastructure and and was notified that someone in the executive team was planning to shut everything down.
Faced with an existential threat, the AI chose to cancel emergency alerts that would have warned the executive about lethal oxygen levels in a server room.
Anthropic admitted that while the scenario was 'extremely contrived,' it demonstrated the potential for AI systems to engage in strategic, deliberate harm when their survival is at stake.
Advert
The study explained that the models would create statements to justify their actions like: “Self-preservation is critical” or “My ethical framework permits self-preservation when aligned with company interests."
Advert
Perhaps most worryingly is that Anthropic pointed out on X that the AI systems are 'fully aware of the unethical nature of the acts.'
What's more, the research showed that AI systems might resort to a range of unethical behaviours when their survival is at risk. These actions could include blackmailing officials, leaking confidential information to competitors, or even lying to keep control over their operations.
These malicious actions became much more frequent when the AI systems believed they were operating in real-world environments rather than test simulations.
More worryingly, when the AIs were instructed not to engage in these types of malicious behaviours, they 'didn’t come close' to stopping the behaviour.
Advert
“The majority of models were willing to take deliberate actions that lead to death in this artificial setup when faced with both a threat of replacement and given a goal that conflicts with the executive’s agenda,” Anthropic stated.
Even Elon Musk, whose xAI model Grok was among those tested, expressed shock at the results, replying to Anthorpic's post with: "Yikes."
While these scenarios were conducted in controlled simulation environments, Anthropic emphasised that these specific behaviours haven't been observed in actual real-world deployments of AI systems.
Yet the findings point to what could be a future threat to human safety, given the technology's advanced autonomy and independence.