Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objectives: Large language models (LLMs) show promise as clinical consultation tools and may assist optic neuritis patients, though research on their performance in this area is limited. Our study aims to assess and compare the performance of four commonly used LLM-Chatbots-Claude-2, ChatGPT-3.5, ChatGPT-4.0, and Google Bard-in addressing questions related to optic neuritis.

Methods: We curated 24 optic neuritis-related questions and had three ophthalmologists rate the responses on two three-point scales for accuracy and comprehensiveness. We also assessed readability using four scales. The final results showed performance differences among the four LLM-Chatbots.

Results: The average total accuracy scores (out of 9): ChatGPT-4.0 (7.62 ± 0.86), Google Bard (7.42 ± 1.20), ChatGPT-3.5 (7.21 ± 0.70), Claude-2 (6.44 ± 1.07). ChatGPT-4.0 ( = 0.0006) and Google Bard ( = 0.0015) were significantly more accurate than Claude-2. Also, 62.5% of ChatGPT-4.0's responses were rated "Excellent," followed by 58.3% for Google Bard, both higher than Claude-2's 29.2% (all ≤ 0.042) and ChatGPT-3.5's 41.7%. Both Claude-2 and Google Bard had 8.3% "Deficient" responses. The comprehensiveness scores were similar among the four LLMs ( = 0.1531). Note that all responses require at least a university-level reading proficiency.

Conclusion: Large language models-Chatbots hold immense potential as clinical consultation tools for optic neuritis, but they require further refinement and proper evaluation strategies before deployment to ensure reliable and accurate performance.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238082PMC
http://dx.doi.org/10.3389/fmed.2025.1516442DOI Listing

Publication Analysis

Top Keywords

google bard
16
large language
12
optic neuritis
12
questions optic
8
clinical consultation
8
consultation tools
8
responses
5
optic
5
google
5
evaluation comparison
4

Similar Publications

Background Artificial intelligence (AI) increasingly impacts medicine and medical specialties, including nephrology. Technologies such as large language models (LLMs), decision-support AI, and machine learning-powered predictive analytics enhance clinical care. These AI-driven tools show great potential in areas such as predicting the risk of chronic kidney disease, managing dialysis, supporting kidney transplantation, and treating CKD and diabetes-related kidney issues.

View Article and Find Full Text PDF

Artificial intelligence (AI) is increasingly being utilized as an informational resource, with chatbots attracting users for their ability to generate instantaneous responses. This study evaluates the understandability, actionability, readability, quality, and misinformation in medical information provided by four prominent chatbots - Bard, ChatGPT 3.5, Claude 2.

View Article and Find Full Text PDF

Objective: Large language models (LLMs) have advanced rapidly, but their utility in pediatric surgery remains uncertain. This study assessed the performance of three AI models-DeepSeek, Microsoft Copilot (GPT-4) and Google Bard-on the European Pediatric Surgery In-Training Examination (EPSITE).

Methods: We evaluated model performance using 294 EPSITE questions from 2021 to 2023.

View Article and Find Full Text PDF

Objectives: Generative AI interfaces like ChatGPT offer a new way to access health information, but it is unclear if information presented is credible compared to traditional search engines. This study aimed to compare the credibility of vaccination information across generative AI interfaces and traditional search engines.

Study Design: Cross sectional content analysis and comparison.

View Article and Find Full Text PDF