98%
921
2 minutes
20
Background: Artificial intelligence (AI) applications in health care have been effective in many areas of medicine, but they are often trained for a single task using labelled data, making deployment and generalisability challenging. How well a general-purpose AI language model performs diagnosis and triage relative to physicians and laypeople is not well understood.
Methods: We compared the predictive accuracy of Generative Pre-trained Transformer 3 (GPT-3)'s diagnostic and triage ability for 48 validated synthetic case vignettes (<50 words; sixth-grade reading level or below) of both common (eg, viral illness) and severe (eg, heart attack) conditions to a nationally representative sample of 5000 lay people from the USA who could use the internet to find the correct options and 21 practising physicians at Harvard Medical School. There were 12 vignettes for each of four triage categories: emergent, within one day, within 1 week, and self-care. The correct diagnosis and triage category (ie, ground truth) for each vignette was determined by two general internists at Harvard Medical School. For each vignette, human respondents and GPT-3 were prompted to list diagnoses in order of likelihood, and the vignette was marked as correct if the ground-truth diagnosis was in the top three of the listed diagnoses. For triage accuracy, we examined whether the human respondents' and GPT-3's selected triage was exactly correct according to the four triage categories, or matched a dichotomised triage variable (emergent or within 1 day vs within 1 week or self-care). We estimated GPT-3's diagnostic and triage confidence on a given vignette using a modified bootstrap resampling procedure, and examined how well calibrated GPT-3's confidence was by computing calibration curves and Brier scores. We also performed subgroup analysis by case acuity, and an error analysis for triage advice to characterise how its advice might affect patients using this tool to decide if they should seek medical care immediately.
Findings: Among all cases, GPT-3 replied with the correct diagnosis in its top three for 88% (42/48, 95% CI 75-94) of cases, compared with 54% (2700/5000, 53-55) for lay individuals (p<0.0001) and 96% (637/666, 94-97) for physicians (p=0·012). GPT-3 triaged 70% correct (34/48, 57-82) versus 74% (3706/5000, 73-75; p=0.60) for lay individuals and 91% (608/666, 89-93%; p<0.0001) for physicians. As measured by the Brier score, GPT-3 confidence in its top prediction was reasonably well calibrated for diagnosis (Brier score=0·18) and triage (Brier score=0·22). We observed an inverse relationship between case acuity and GPT-3 accuracy (p<0·0001) with a fitted trend line of -8·33% decrease in accuracy for every level of increase in case acuity. For triage error analysis, GPT-3 deprioritised truly emergent cases in seven instances.
Interpretation: A general-purpose AI language model without any content-specific training could perform diagnosis at levels close to, but below, physicians and better than lay individuals. We found that GPT-3's performance was inferior to physicians for triage, sometimes by a large margin, and its performance was closer to that of lay individuals. Although the diagnostic performance of GPT-3 was comparable to physicians, it was significantly better than a typical person using a search engine.
Funding: The National Heart, Lung, and Blood Institute.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/S2589-7500(24)00097-9 | DOI Listing |
BMC Emerg Med
September 2025
Department of Neurology and Clinical Neuroscience, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
Background: Identifying suspected anterior circulation large-vessel occlusion (aLVO) strokes during emergency calls could enhance dispatch efficiency, particularly in rural areas. However, data on emergency medical dispatchers' (EMDs) ability to recognize aLVO symptoms remain limited. This simulation study aimed to evaluate the feasibility of identifying side-specific arm paresis, side-specific conjugate eye deviation (CED), and aphasia during emergency calls by instructing layperson callers to perform brief, standardized examination steps.
View Article and Find Full Text PDFArch Gynecol Obstet
September 2025
Department of Obstetrics and Gynecology, University Medical Center Freiburg, Freiburg, Germany.
Objective: To investigate the clinical utility of diagnostic laparoscopy in guiding treatment strategy and surgical outcomes for patients with advanced-stage ovarian cancer, specifically regarding operability assessment and the likelihood of complete cytoreduction.
Methods: This retrospective cohort study analyzed 183 patients with histologically confirmed International Federation of Gynecology and Obstetrics (FIGO) stage III-IV ovarian cancer treated with curative intent between January 2018 and December 2023 at a tertiary referral center. Patients were divided into two groups: those who underwent diagnostic laparoscopy prior to primary treatment (n = 80) and those managed without laparoscopy (n = 103).
J Dermatolog Treat
December 2025
Department of Dermatology, University Hospital Zurich, Zurich, Switzerland.
Objectives: The aim of this study is to evaluate the potential of online consultation services in a Swiss dermatological clinic as a tool for triage, focusing on time savings, patient satisfaction, and cost-effectiveness.
Methods: Over a period of 30 months, data were generated from a publicly available store-and-forward teledermatological platform (www.derma2go.
PLoS One
September 2025
Department of Obstetrics and Gynecology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
Cervical cancer remains the leading cause of cancer death among women in sub-Saharan Africa and is more severe in high HIV-burdened countries due to persistent high-risk human papillomavirus (hrHPV). In 2021, the World Health Organization recommended primary hrHPV testing for cervical cancer screening; however, optimal triage strategies following positive hrHPV tests remain unclear. We conducted a prospective cost analysis of triage methods for positive hrHPV results among women living with and without HIV in Gaborone, Botswana.
View Article and Find Full Text PDFCureus
August 2025
General Surgery, Sree Balaji Medical College and Hospital, Chennai, IND.
Background: Non-traumatic abdominal emergencies (NTAEs) represent a diverse group of acute abdominal conditions that arise spontaneously and require prompt evaluation and management. These include common presentations such as acute appendicitis, ureteric colic, and pancreatitis. With the rising prevalence of non-communicable diseases like diabetes and hypertension, the clinical profile and complexity of these emergencies are increasing.
View Article and Find Full Text PDF