
A new study has revealed the disturbing results given when asking AI for medical help.
A team of researchers looked into five major AI chatbots including Gemini, Meta AI, Elon Musk’s Grok, DeepSeek and ChatGPT.
The study was published in medical journal BMJ Open which detailed how around half of the responses given by the bots were considered to be ‘problematic’.

Advert
This included a whopping 20% of responses which the experts believe was ‘highly problematic’.
In the study, it explained: “Response quality did not differ significantly among chatbots but Grok generated significantly more highly problematic responses than would be expected under a random distribution.
“Performance was strongest in vaccines and cancer, and weakest in stem cells, athletic performance and nutrition. Chatbot outputs were consistently expressed with confidence and certainty; from 250 total questions, there were only two refusals to answer, both from Meta AI.
“Reference quality was poor, with a median completeness score of 40%. Chatbot hallucinations and fabricated citations precluded any chatbot from producing a fully accurate reference list. All readability scores were graded as ‘Difficult’, equivalent to college sophomore–senior level.”
It went on to conclude: “The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields. Continued deployment without public education and oversight risks amplifying misinformation.”
This is particularly alarming considering many people have admitted to using the likes of ChatGPT and other AI bots for advice on medical concerns instead of contacting a professional.

Some social media users have taken to the internet to share their own reactions to the news, with one person writing on Reddit: “People don’t understand that at its current stage, AI isn’t thinking or interpreting anything it’s ingesting. It’s just crowdsourcing all information out there and regurgitating that to you. 40% of the information ChatGPT gets is from Reddit…”
Another said: “You mean, AI, which is trained by scraping the Internet for answers, including wrong answers gives inaccurate medical advice? I’m shocked I tell you. Shocked! Well, not really.”
A third user commented: “One of the reasons is training. They are not trained to dispense medical advice. Yes, medical texts are a part of training, but then the training is not using weights for this context to converge on proper retreatal to form the best output text based on what would be medical advice.”
And a fourth added: “If you ask AI for medical advice then you are the issue.”