Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation.

Sami Barrit , Nathan Torcida , Aurelien Mazeraud , Sebastien Boulogne , Jeanne Benoit , Timothée Carette , Thibault Carron , Bertil Delsaut , Eva Diab , Hugo Kermorvant , Adil Maarouf , Sofia Maldonado Slootjes , Sylvain Redon , Alexis Robin , Sofiene Hadidane , Vincent Harlay , Vito Tota , Tanguy Madec , Alexandre Niset , Mejdeddine Al Barajraji

Brain Sci

Sciense, New York, NY 10013, USA.

Published: March 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM's capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. : We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating "long-term memory" and "short-term memory" components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. : AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists' average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. : A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12025783	PMC
http://dx.doi.org/10.3390/brainsci15040347	DOI Listing

Publication Analysis

Top Keywords

large language

neurologists complex

complex clinical

neurologists

clinical

specialized

specialized large

language model

model outperforms

outperforms neurologists

Similar Publications

A plain language summary of the MIRACLE study: benralizumab in people in Asia with severe asthma.

Immunotherapy

September 2025

aGuangzhou Institute of Respiratory Health, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.

Kefang Lai , Dejun Sun , Ranran Dai , Hae-Sim Park , Annika Åstrand

View Article and Find Full Text PDF

Similar Publications

Integrating Generative Artificial Intelligence in Midwifery Education: Balancing Innovation, Ethics, and Academic Integrity.

J Midwifery Womens Health

September 2025

General Education Department Chair, Midwives College of Utah, Salt Lake City, Utah.

Megan Koontz , Stefanie Podlog

Applications driven by large language models (LLMs) are reshaping higher education by offering innovative tools that enhance learning, streamline administrative tasks, and support scholarly work. However, their integration into education institutions raises ethical concerns related to bias, misinformation, and academic integrity, necessitating thoughtful institutional responses. This article explores the evolving role of LLMs in midwifery higher education, providing historical context, key capabilities, and ethical considerations.

View Article and Find Full Text PDF

Similar Publications

It's Hey Jude, not Hey Jade: Input Variation and the Emergence of the Infant Lexicon.

J Child Lang

September 2025

Department of Psychology, University of TorontoMississauga, Mississauga, Ontario, Canada.

Helen Buckler , Elizabeth K Johnson

A growing literature explores the representational detail of infants' early lexical representations, but no study has investigated how exposure to real-life acoustic-phonetic variation impacts these representations. Indeed, previous experimental work with young infants has largely ignored the impact of accent exposure on lexical development. We ask how routine exposure to accent variation affects 6-month-olds' ability to detect mispronunciations.

View Article and Find Full Text PDF

Similar Publications

Resource Utilization for Brief Resolved Unexplained Events in a Pediatric and General Emergency Department.

Pediatr Emerg Care

September 2025

Albert Einstein College of Medicine.

Daniel M Fein , Leon Chen , Nina Samuel , Michael D Cabana

Objectives: The primary aim of this study was to compare resource utilization between lower and higher-risk brief resolved unexplained events (BRUE) in the general (GED) and pediatric (PED) emergency departments.

Methods: We conducted a retrospective chart review of BRUE cases from a large health system over 6-and-a-half years. Our primary outcome was the count of diagnostic tests per encounter.

View Article and Find Full Text PDF

Similar Publications

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

J Imaging Inform Med

September 2025

Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.

Fabio Dennstädt , Simon Fauser , Nikola Cihoric , Max Schmerder , Paolo Lombardo

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure.

View Article and Find Full Text PDF

Similar Publications