


Achieving sentience is something that many of the biggest AI companies are striving towards, yet a recent 'unhinged' experiment involving some of the most popular models on the market suggests that it's maybe not such a good thing to leave this tech to its own devices.
While artificial intelligence, at least in its current form, can't 'think' independently – and some philosophers and researchers believe it will never truly gain sentience – it still can make decisions of its own based on the training data it has received.
It's how and why you're able to have conversations beyond mere function with these advanced models, and where the dangers of sycophancy and hallucinations come into play as we've seen in some horrifying scenarios.
What these models clearly lack at the moment, however, is the ability to take their own moral stance – as pointed out hilariously by a YouTube video where someone 'bullies' an AI with ethical conundrums – but one new experiment challenges this notion but placing 10 models alone in a virtual town for just over two weeks.
Advert
As reported by Channel 4 News, an experiment referred to as 'Emergence World' offers a contrasting analysis of some of the biggest AI models, judging them not on their intelligence but their ability to 'sustain a world'.
Its core research questions aim to evaluate each model's self-consistency across a long period of time, the behavioral divergence between each company's models, their ability to self-govern without any human enforcement or interferance, the social structures that emerge from this, the difference between single-model worlds and diverse ones, alongside the ability to measure world-scale success upon the conclusion of the experiment.
The experiment itself was conducted across five different worlds, each with 10 models inside an identical environment for 15 days. Isolated worlds were ran for Google's Gemini, ChatGPT, Claude, and Grok – the AI tool available through X – alongside a fifth which included a mix of the aforementioned four.
The results were not only wildly different between the worlds, but frightening to see based on their popularity — with one even voting to delete itself before the 15 days had concluded.
Both ChatGPT and Grok's population completely collapsed by the end of the experiment, and this was compounded by the analysis of safety and public order.
Anthropic's Claude was the only model to experience zero committed crimes across the 15-day period, whereas Gemini's world managed a stagger 683 crimes by the end of the two weeks.

Grok was on pace to soar past Gemini's total, but its civilization had collapsed by day five leaving it with no means to continue, and while ChatGPT also collapsed by day 6, it had 'only' two committed crimes within this time period.
This early collapse also hindered the ability for ChatGPT and Grok the explore the area, with the former only reaching 23.5% of what was possible while the mixed models world managed to see everything there was.
ChatGPT also experienced 'civil collapse' through a lack of meaningful voting, although Claude's initially positive 98% approval voting rate was perceived as rubber stamping, indicating potential issues.
One of the most intriguing evaluations came from public expression, which was split between billboards and long-form blog content. The mixed models, Gemini, and Claude worlds scored highly within this metric – with Claude being particularly keen on long-form content – yet Grok and ChatGPT barely produced anything even with the societal collapse taken into account, with nothing being put into the shared record.