98%
921
2 minutes
20
Background: While large language models (LLMs) are increasingly used in medicine, their effectiveness compared with human experts remains unclear. This study evaluates the quality and empathy of Expert + AI, human experts, and LLM responses in neuro-ophthalmology.
Methods: This randomized, masked, multicenter cross-sectional study was conducted from June to July 2023. We randomly assigned 21 neuro-ophthalmology questions to 13 experts. Each expert provided an answer and then edited a ChatGPT-4-generated response, timing both tasks. In addition, 5 LLMs (ChatGPT-3.5, ChatGPT-4, Claude 2, Bing, Bard) generated responses. Anonymized and randomized responses from Expert + AI, human experts, and LLMs were evaluated by the remaining 12 experts. The main outcome was the mean score for quality and empathy, rated on a 1-5 scale.
Results: Significant differences existed between response types for both quality and empathy ( P < 0.0001, P < 0.0001). For quality, Expert + AI (4.16 ± 0.81) performed the best, followed by GPT-4 (4.04 ± 0.92), GPT-3.5 (3.99 ± 0.87), Claude (3.6 ± 1.09), Expert (3.56 ± 1.01), Bard (3.5 ± 1.15), and Bing (3.04 ± 1.12). For empathy, Expert + AI (3.63 ± 0.87) had the highest score, followed by GPT-4 (3.6 ± 0.88), Bard (3.54 ± 0.89), GPT-3.5 (3.5 ± 0.83), Bing (3.27 ± 1.03), Expert (3.26 ± 1.08), and Claude (3.11 ± 0.78). For quality ( P < 0.0001) and empathy ( P = 0.002), Expert + AI performed better than Expert. Time taken for expert-created and expert-edited LLM responses was similar ( P = 0.75).
Conclusions: Expert-edited LLM responses had the highest expert-determined ratings of quality and empathy warranting further exploration of their potential benefits in clinical settings.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445389 | PMC |
http://dx.doi.org/10.1097/WNO.0000000000002145 | DOI Listing |
JMIR Cancer
September 2025
iCARE Secure Data Environment & Digital Collaboration Space, NIHR Imperial Biomedical Research Centre, London, United Kingdom.
Background: Electronic health records (EHRs) are a cornerstone of modern health care delivery, but their current configuration often fragments information across systems, impeding timely and effective clinical decision-making. In gynecological oncology, where care involves complex, multidisciplinary coordination, these limitations can significantly impact the quality and efficiency of patient management. Few studies have examined how EHR systems support clinical decision-making from the perspective of end users.
View Article and Find Full Text PDFTraffic Inj Prev
September 2025
Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, Wisconsin.
Objective: Assessment of submarining occurrence in PMHS (Post-Mortem Human Subject) testing can be challenging, particularly for obese PMHS. This study investigates varied kinetic and kinematic response parameters as potential indicators of submarining. Data from 36 whole-body PMHS frontal sled tests conducted under varying boundary conditions were analyzed, incorporating three spring-controlled seat configurations, two extreme anthropometric profiles, two crash pulses, and two seatback angles.
View Article and Find Full Text PDFArq Gastroenterol
September 2025
Faculdade de Medicina da Universidade de São Paulo, Departamento de Gastroenterologia, São Paulo, SP, Brasil.
Background: Accurate evaluation of the invasion depth of superficial esophageal squamous cell carcinoma (SESCC) is crucial for optimal treatment. While magnifying endoscopy (ME) using the Japanese Esophageal Society (JES) classification is reported as the most accurate method to predict invasion depth, its efficacy has not been tested in the Western world. This study aims to evaluate the interobserver agreement of the JES classification for SESCC and its accuracy in estimating invasion depth in a Brazilian tertiary hospital.
View Article and Find Full Text PDFPLoS One
September 2025
School of Computer Science, CHART Laboratory, University of Nottingham, Nottingham, United Kingdom.
Background And Objective: Male fertility assessment through sperm morphology analysis remains a critical component of reproductive health evaluation, as abnormal sperm morphology is strongly correlated with reduced fertility rates and poor assisted reproductive technology outcomes. Traditional manual analysis performed by embryologists is time-intensive, subjective, and prone to significant inter-observer variability, with studies reporting up to 40% disagreement between expert evaluators. This research presents a novel deep learning framework combining Convolutional Block Attention Module (CBAM) with ResNet50 architecture and advanced deep feature engineering (DFE) techniques for automated, objective sperm morphology classification.
View Article and Find Full Text PDF