Oxford study shows AI chatbots Lie with Confidence regarding Health

AI health chatbots have been having a moment. They sound calm, they sound sure, and they answer fast. So it’s easy to treat them like a late-night clinic that never closes. Oxford just tested the one question that matters: do these tools actually help regular people make better medical calls at home?
The answer is basically no, and the way it fails is the part worth remembering. In the study, nearly 1,300 UK adults were given ten everyday medical scenarios written by doctors, then asked to do two things: name what might be going on and choose what to do next, from staying home to seeking emergency care. Some participants used an LLM for help. Others did what people already do: search, think, and make a call.
Here’s the gut punch. When the models were tested alone, they looked strong. They identified relevant conditions in about 94.9% of cases and got the next-step decision right 56.3% of the time on average. But when people used those same models, the numbers dropped hard. Participants with chatbots identified relevant conditions in fewer than 34.5% of cases and picked the right next step fewer than 44.2% of the time. That wasn’t better than the control group. The bot can “know” the answer and still not get you to it.
Oxford’s explanation is very human. People don’t know which details matter, so they leave out the important stuff or describe symptoms the way they’d text a friend. In that exact vibe, picture someone typing, “Pain in the back of my eye. Kinda dizzy. Probably nothing, right?” The bot replies like a confident coworker: “Could be a headache or screen strain, try rest and water.” That’s the danger zone. The reply is just casual enough to make you wait when waiting could cost you. And because the chatbot often mixes solid guidance with shaky advice in the same message, you’re left guessing which line to trust. Add the fact that tiny wording changes can swing the response, and you get something that feels reliable while behaving… moody.
Online reactions split the way you’d expect. One crowd reads this and goes, told you, these things are confidence machines. Another crowd goes, it worked for me, so what’s the problem?
Both reactions miss the point. The risk is hat the model can be half right, sound fully right, and still steer you wrong when urgency is the whole game. As Dr. Rebecca Payne put it, AI “isn’t ready” to play physician, and asking about symptoms “can be dangerous” when urgent help is needed.
Right now, chatbots are best treated like a question organizer, not a decision maker. They can help you form better questions for a real clinician. They shouldn’t be the thing that talks you out of taking something seriously.
Y. Anush Reddy is a contributor to this blog.



