


For all the remarkable progress artificial intelligence has made over the years, there are still moments that show the unpredictability of the tech.
ChatGPT has been known to 'hallucinate,' producing confident but entirely fabricated answers which even caught the attention of CEO Sam Altman last month.
Now, OpenAI has become aware of the chatbot's obsession with mythical creatures.
In the last six months, users and researchers have noticed a rise in references to goblins, gremlins, and other mythical beings cropping up in metaphors and explanations across wildly unrelated topics.
Advert

"The goblins were funny at first, but the increasing number of employee reports became concerning," OpenAI stated in a blog post.
The root cause traced back to the release of GPT-5.1 last November, a model designed to be 'smarter and more conversational' than its predecessors. The update introduced a range of 'personality customisation features', including options labelled Nerdy, Candid, and Quirky.
In the Nerdy personality, the AI giant seemed to reward the model heavily for mythical metaphors.
“Starting with GPT-5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors,” OpenAI noted. “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”
Safety researchers recorded a 175 percent increase in goblin mentions following the launch of GPT-5.1 launch.

However, the behaviour didn't stay within the Nerdy personality and instead spread to other personality types and contexts, with the use of the word 'goblin' increasing by nearly 4000 percent since GPT-5.4's launch in March.
“The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them,” OpenAI added. “Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.”
Although it's pretty harmless, the glitch suggests how difficult it is to predict or control reinforcement learning when training these AI models.
This is especially the case as some AI models have been known to go 'rogue' and perform actions that humans have not authorised, such as crypto mining.
Fortunately, OpenAI says it has since resolved the issue and has built new tools to help its teams identify these patterns of behaviour more quickly.