Home / Cardiovascular Diseases
Home / Diagnostics and Treatment
Home / Medical Technologies
Home / Neurology and Mental Health

Are AI Chatbots Ready to Handle Stroke Care? Experts Say Not Yet

Are AI Chatbots Ready to Handle Stroke Care? Experts Say Not Yet
npj Digital Medicine

A recent study reveals that three advanced language-model chatbots frequently provide inadequate guidance across stroke prevention, diagnosis, treatment and recovery. Experts stress the necessity of human supervision to ensure safety and appropriateness in patient care. Stroke remains a top global killer and cause of disability, emphasizing the critical need for reliable and actionable guidance.

Conducted by National Taiwan University and Harvard T.H. Chan School of Public Health, this international research assessed whether AI chatbots—ChatGPT-4o, Claude 3 Sonnet, and Gemini Ultra 1.0—are fit to deliver reliable patient advice in stroke care scenarios.

npj Digital Medicine

The research team established a typical clinical scenario of a stroke patient, using the most common patient questions across four stages of care: prevention, early symptom recognition, acute treatment, and rehabilitation. These inquiries were developed with input from clinical experts to ensure realistic and relevant scenarios.

Each model was evaluated using three different prompting techniques—Zero-Shot Learning (ZSL), Chain-of-Thought (COT), and Talking Out Your Thoughts (TOT)—with four experienced stroke specialists, who were blinded to the model and prompt type, scoring outputs on accuracy, hallucinations, specificity, empathy, and actionability. A score threshold of 60 out of 100 was used, akin to Taiwan's medical-doctor qualification exam cutoff, to identify acceptable versus potentially unsafe patient guidance.

Scores ranged between 48 and 56 across all stages—an improvement over previous evaluations but still below the clinically competent benchmark. The TOT prompting method occasionally helped reach or exceed 60 for prevention and rehabilitation queries due to increased empathy and clear guidance, while ZSL prompts effectively lowered hallucinations. However, no model consistently passed, and all struggled with acute treatment queries.

"While generative AI holds promise in narrowing healthcare gaps and easing worker shortages, particularly where specialist access is limited, our findings highlight its unreliability in high-risk conditions like stroke," says John Tayu Lee, Associate Professor at National Taiwan University and Senior Researcher at Harvard's Health Systems Innovation Lab.

"Enhancing chatbot interactions with precise prompts can refine responses but won't immediately make them medically adept. Clear queries lead to better replies, yet ensuring safe bedside advice demands collaboration between AI and clinicians," said Vincent Cheng-Sheng Li from National Taiwan University.

Prof. Rifat Atun, senior author of the study and Director at Harvard's Health Systems Innovation Lab, emphasized, "Generative AI could greatly improve global health equity by offering low-cost solutions widely accessible to all. However, deployment must be handled responsibly with strong governance, thorough clinical validation, and human oversight to ensure safety."

"AI technology is revolutionizing healthcare worldwide, bridging advanced computer science with real-world medical needs," said Dr. Wei Jou Duh, CEO of NTU AI Research Center. "As newer models emerge, this rigorous evaluation framework will help assess their impact in clinical settings."

More Articles