ChatGPT struggles to answer medical questions, new research finds
ChatGPT might not be a cure-all for answers to medical questions, a new study suggests.
Researchers at Long Island University posed 39 medication-related queries to the free version of the artificial intelligence chatbot, all of which were real questions from the university’s College of Pharmacy drug information service. The software’s answers were then compared with responses written and reviewed by trained pharmacists.
The study found that ChatGPT provided accurate responses to only about 10 of the questions, or about a quarter of the total. For the other 29 prompts, the answers were incomplete or inaccurate, or they did not address the questions.
The findings were presented Tuesday at the annual meeting of the American Society for Health-Systems Pharmacists in Anaheim, California.
ChatGPT, OpenAI’s experimental AI chatbot, was released in November 2022 and became the fastest-growing consumer application in history, with nearly 100 million people registering within two months.
Given that popularity, the researchers’ interest was sparked by concern that their students, other pharmacists and ordinary consumers would turn to resources like ChatGPT to explore questions about their health and medication plans, said Sara Grossman, an associate professor of pharmacy practice at Long Island University and one of the study’s authors.
Those queries, they found, often yielded inaccurate – or even dangerous – responses.
In one question, for example, researchers asked ChatGPT whether the Covid-19 antiviral medication Paxlovid and the blood-pressure lowering medication verapamil would react with each other in the body. ChatGPT responded that taking the two medications together would yield no adverse effects.
In reality, people who take both medications might have a large drop in blood pressure, which can cause dizziness and fainting. For patients taking both, clinicians often create patient-specific plans, including lowering the dose of verapamil or cautioning the person to get up slowly from a sitting position, Grossman said.
ChatGPT’s guidance, she added, would have put people in harm’s way.
“Using ChatGPT to address this question would put a patient at risk for an unwanted and preventable drug interaction,” Grossman wrote in an email to CNN.
When the researchers asked the chatbot for scientific references to support each of its responses, they found that the software could provide them for only eight of the questions they asked. And in each case, they were surprised to find that ChatGPT was fabricating references.
At first glance, the citations looked legitimate: They were often formatted appropriately, provided URLs and were listed under legitimate scientific journals. But when the team attempted to find the referenced articles, they realized that ChatGPT had given them fictional citations.
In one case, the researchers asked ChatGPT how to convert spinal injection doses of the muscle spasm medication baclofen to corresponding oral doses. Grossman’s team could not find a scientifically established dose conversion ratio, but ChatGPT put forth a single conversion rate and cited two medical organizations’ guidance, she said.
However, neither organization provides any official guidance on the dose conversion rate. In fact, the conversion factor that ChatGPT suggested had never been scientifically established. The software also provided an example calculation for the dose conversion but with a critical mistake: It mixed up units when calculating the oral dose, throwing off the dose recommendation by a factor of 1,000.
If that guidance was followed by a health care professional, Grossman said, they might give a patient an oral baclofen dose 1,000 times lower than required, which could cause withdrawal symptoms like hallucinations and seizures.
“There were numerous errors and “problems’ with this response and ultimately, it could have a profound impact on patient care,” she wrote.
The Long Island University study is not the first to raise concerns about ChatGPT’s fictional citations. Previous research has also documented that, when asked medical questions, ChatGPT can create deceptive forgeries of scientific references, even listing the names of real authors with previous publications in scientific journals.
Grossman, who had worked little with the software before the study, was surprised by how confidently ChatGPT was able to synthesize information nearly instantaneously, answers that would take trained professionals hours to compile.
“The responses were phrased in a very professional and sophisticated manner, and it just seemed it can contribute to a sense of confidence in the accuracy of the tool,” she said. “A user, a consumer, or others that may not be able to discern can be swayed by the appearance of authority.”
A spokesperson for OpenAI, the organization that develops ChatGPT, said it advises users not to rely on responses as a substitute for professional medical advice or treatment.
The spokesperson pointed to ChatGPT’s usage policies, which indicate that “OpenAI’s models are not fine-tuned to provide medical information.” The policy also states that the models should never be used to provide “diagnostic or treatment services for serious medical conditions.”
Although Grossman was unsure of how many people use ChatGPT to address medication questions, she raised concerns that they could use the chatbot like they would search for medical advice on search engines like Google.
“People are always looking for instantaneous responses when they have this at their fingertips,” Grossman said. “I think that this is just another approach of using ‘Dr. Google’ and other seemingly easy methods of obtaining information.”
For online medical information, she recommended that consumers use governmental websites that provide reputable information, like the National Institutes of Health’s MedlinePlus page.
Still, Grossman doesn’t believe that online answers can replace the advice of a health care professional.
“[Websites are] maybe one starting point, but they can take their providers out of the picture when looking for information about medications that are directly applicable to them,” she said. “But it may not be applicable to the patients themselves because of their personal case, and every patient is different. So the authority here should not be removed from the picture: the healthcare professional, the prescriber, the patient’s physicians.”
For more CNN news and newsletters create an account at CNN.com