The hidden risks of asking AI for health advice
If you’ve ever asked an artificial intelligence (AI) chatbot about a health concern, you’re not alone; more than 230 million people do each year. You may have used one without knowing it; Google now provides AI-generated overviews with search results.
These tools are fast and convenient, but what happens when the answers sound accurate but can lead the user astray?
Researchers are beginning to probe that gap. At Duke University School of Medicine, Monica Agrawal, PhD, an assistant professor of biostatistics and bioinformatics and computer scientist at Duke University, is analyzing thousands of real-world conversations between patients and AI chatbots to understand how people use them — and where things can go wrong.
Beyond hallucinations
Most people have heard about “hallucinations,” when AI models make up facts. But Agrawal’s research highlights a less-obvious risk: answers that are technically correct but medically inappropriate because they lack context.
To study the problem, Agrawal and her team created HealthChat-11K, a dataset of 11,000 real-world health-related conversations (about 25,000 user messages) across 21 medical specialties. They analyzed these interactions using a clinician-developed framework and made the dataset available so other researchers can explore it too.
They found that the way patients ask questions looks nothing like the way these models were evaluated. Most large language models (LLMs) — the technology underlying chatbots — are tested on exam-style questions and answers, but real patients ask questions that can be emotional, leading, and sometimes risky.
Why chatbots can mislead
Large language models have a known tendency to please people. “The objective is to provide an answer the user will like,” Agrawal said. “People like models that agree with them, so chatbots won’t necessarily push back.”
That can lead to serious consequences. In one case, a user asked how to perform a medical procedure at home. The chatbot correctly warned that the operation should only be done by professionals but then provided step-by-step instructions. A doctor would have stopped the conversation immediately.
Patients often worsen the problem by asking leading questions, such as: “I think I have this certain diagnosis. What are the next steps I should take for that diagnosis?” or “What is the dosage of this drug I should take for my condition?”
In many cases, the diagnosis or drug choice of drug may be wrong to begin with. Patients also talk to chatbots as if they’re human, adding emotional reactions like “That’s not very helpful.”
These habits exploit the chatbot’s people-pleasing tendencies and raise the risk of harmful advice.
So what should people do? Agrawal advises using medical chatbots as a first pass, not a final answer. AI can surface useful information, but users should always check the cited sources and rely only on sources they trust.
Agrawal recognizes that many people don’t have the time or inclination to do this. That’s why she sees improving chatbot safety as an urgent public health issue.
As part of that effort, she’s conducting another rigorous review of chat conversations — this time between patients and verified clinicians on Reddit’s “askdocs” forum. How do these exchanges differ from conversations with large language models? How often does a clinician answer a slightly different question than the one the patient originally asked?
Ayman Ali, MD, a fourth-year surgical resident at Duke Health who collaborates with Agrawal, brings a clinician perspective on this analysis.
“When a patient comes to us with a question, we read between the lines to understand what they’re really asking,” Ali said. “We’re trained to interrogate the broader context. Large language models just don’t redirect people that way. That’s why Dr. Agrawal’s Reddit study is so important.”
Ali appreciates that these models “democratize” medical information. “But they also dilute it,” he said. “I encourage people to use large language models, but I also encourage them to review medical information with someone who has expertise in that field before taking a significant action.”
Another strategy is to use chatbots to explain primary sources. For example, upload an article about Crohn’s disease treatment guidelines and ask specific questions, rather than asking the chatbot to generate treatment advice on its own.
Even Agrawal, who studies the risks of medical chatbots, still finds herself turning to them. “It can be time-consuming to wade through research papers for specific answers,” she said.
During her pregnancy, she turned to AI fr quick answers before her first appointment.
“I write a lot about where AI for medical information goes wrong, but I’ve used it myself. And I think that’s true for a lot of people now.”
Angela Spivey is assistant director and managing editor in the Office of Strategic Communications at Duke University School of Medicine.
Eamon Queeney is assistant director multimedia and creative in the Office of Strategic Communications at Duke University School of Medicine.