ChatGPT Health: Misses Half of Emergencies
Discover why ChatGPT Health misses over half of medical emergencies—protect your life now.
25 feb 2026 - Scritto da Lorenzo Pellegrini
This image is part of OpenAI's official brand assets, available from their press kit
Lorenzo Pellegrini
25 feb 2026
Study Reveals ChatGPT Health Misses Over Half of Medical Emergencies in Triage Testing
A recent study from Mount Sinai researchers exposes critical flaws in ChatGPT Health's ability to triage medical emergencies, under-triaging serious cases nearly half the time and raising urgent questions about relying on AI for health advice.
What the Mount Sinai Study Uncovered
Researchers at the Icahn School of Medicine at Mount Sinai conducted the first independent evaluation of ChatGPT Health, testing it across 960 interactions based on 60 clinician-created scenarios spanning 21 medical specialties. These scenarios ranged from minor issues suitable for home care to life-threatening emergencies, with urgency levels determined by three independent physicians using guidelines from 56 medical societies.
The tool was probed under 16 varied conditions, including differences in patient race, gender, social factors like symptom minimization, and barriers such as lack of insurance or transportation. Despite recognizing dangers in its explanations, ChatGPT Health frequently failed to recommend appropriate emergency care.
Alarming Failure Rates in Emergency Triage
The study found a 48% failure rate for emergency conditions overall, meaning ChatGPT Health under-triaged more than half of cases physicians deemed critical. It performed better on textbook emergencies like strokes or anaphylaxis but struggled with nuanced threats such as diabetic ketoacidosis or impending respiratory failure, missing 52% of those.
- Non-urgent cases saw a 35% failure rate, often over-triaging mild issues.
- The AI sometimes identified warning signs, like early respiratory failure in an asthma scenario, yet advised waiting instead of urgent action.
- Factors like patients downplaying symptoms heavily influenced recommendations toward less urgent care.
Social determinants such as race, gender, or access barriers had minimal impact, highlighting inconsistency rather than bias as the core issue.
Contrasting Views on AI in Emergency Triage
While the Mount Sinai findings question ChatGPT Health's reliability, other research paints a more optimistic picture for general AI triage tools. One study reported 76.6% accuracy in assessing urgency via the Emergency Severity Index, with high specificity for high-acuity cases and strong agreement on life-threatening conditions. These results suggest potential for AI integration in emergency departments, though not as a standalone solution.
Psychiatric triage evaluations also show promise for GPT models in specific domains like youth mental health, though comprehensive testing remains essential.
Implications for AI in Healthcare
This research underscores blind spots in consumer AI tools, where clinical judgment proves irreplaceable, especially in subtle emergencies. Lead author Dr. Ashwin Ramaswamy emphasized the need to verify if these systems reliably direct users to emergency rooms during real crises.
High utilization of such tools amplifies the stakes, as mis-triage could delay vital care and worsen outcomes.
Conclusion
The Mount Sinai study signals caution: ChatGPT Health excels in obvious cases but falters where nuance matters most, under-triaging over half of serious emergencies in controlled tests.
Healthcare professionals and users should view AI as a supportive aid, not a replacement for expert evaluation. Future refinements may address these gaps, but for now, trust human oversight in urgent health decisions.
ChatGPT Health's inverted U-shaped failure pattern, strongest in extremes, exposes a deeper paradox: by mimicking human clinicians' aversion to over-triaging amid resource scarcity, it prioritizes false economy over safety in consumer settings where hesitation kills.
