98%
921
2 minutes
20
Background: Kawasaki disease (KD) presents complex clinical challenges in diagnosis, treatment, and long-term management, requiring a comprehensive understanding by both parents and healthcare providers. With advancements in artificial intelligence (AI), large language models (LLMs) have shown promise in supporting medical practice. This study aims to evaluate and compare the appropriateness and comprehensibility of different LLMs in answering clinically relevant questions about KD and assess the impact of different prompting strategies.
Methods: Twenty-five questions were formulated, incorporating three prompting strategies: No prompting (NO), Parent-friendly (PF), and Doctor-level (DL). These questions were input into three LLMs: ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Responses were evaluated based on appropriateness, educational quality, comprehensibility, cautionary statements, references, and potential misinformation, using Information Quality Grade, Global Quality Scale (GQS), Flesch Reading Ease (FRE) score, and word count.
Results: Significant differences were found among the LLMs in terms of response educational quality, accuracy, and comprehensibility ( < 0.001). Claude 3.5 provided the highest proportion of completely correct responses (51.1%) and achieved the highest median GQS score (5.0), outperforming GPT-4o (4.0) and Gemini 1.5 (3.0) significantly. Gemini 1.5 achieved the highest FRE score (31.5) and provided highest proportion of responses assessed as comprehensible (80.4%). Prompting strategies significantly affected LLM responses. Claude 3.5 Sonnet with DL prompting had the highest completely correct rate (81.3%), while PF prompting yielded the most acceptable responses (97.3%). Gemini 1.5 Pro showed minimal variation across prompts but excelled in comprehensibility (98.7% under PF prompting).
Conclusion: This study indicates that LLMs have great potential in providing information about KD, but their use requires caution due to quality inconsistencies and misinformation risks. Significant discrepancies existed across LLMs and prompting strategies. Claude 3.5 Sonnet offered the best response quality and accuracy, while Gemini 1.5 Pro excelled in comprehensibility. PF prompting with Claude 3.5 Sonnet is most recommended for parents seeking KD information. As AI evolves, expanding research and refining models is crucial to ensure reliable, high-quality information.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11994668 | PMC |
http://dx.doi.org/10.3389/frai.2025.1571503 | DOI Listing |
Pediatr Surg Int
September 2025
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang Province, 310018, People's Republic of China.
Comput Biol Med
August 2025
The First People Hospital of Foshan, Foshan City CN, China. Electronic address:
Brain Tumor Segmentation (BTS) is crucial for accurate diagnosis and treatment planning, but existing CNN and Transformer-based methods often struggle with feature fusion and limited training data. While recent large-scale vision models like Segment Anything Model (SAM) and CLIP offer potential, SAM is trained on natural images, lacking medical domain knowledge, and its decoder struggles with accurate tumor segmentation. To address these challenges, we propose the Medical SAM-Clip Grafting Network (MSCG), which introduces a novel SC-grafting module.
View Article and Find Full Text PDFRisk Anal
September 2025
Edward J. Bloustein School, Rutgers University, New Brunswick, New Jersey, USA.
This AI-assisted review article offers a dual review: a book review of Living with Risk in the Late Roman World by Cam Grey, and a critical review of the current potential of large language models (LLMs), specifically ChatGPT's DeepResearch mode, to assist in thoughtful and scholarly book reviewing within risk science. Grey's book presents an innovative reconstruction of how communities in the late Roman Empire perceived and adapted to chronic environmental and societal risks, emphasizing spatial variability, cultural interpretation, and the normalization of uncertainty. Drawing on commentary from a human reviewer and a parallel AI-assisted analysis, we compare the distinct strengths and limitations of each approach.
View Article and Find Full Text PDFJ Dent
September 2025
Dental Clinic Post-Graduate Program, University Center of State of Pará, Belém, Pará, Brazil. Electronic address:
Objective: This study evaluated the coherence, consistency, and diagnostic accuracy of eight AI-based chatbots in clinical scenarios related to dental implants.
Methods: A double-blind, clinical experimental study was carried out between February and March 2025, to evaluate eight AI-based chatbots using six fictional cases simulating peri-implant mucositis and peri-implantitis. Each chatbot answered five standardized clinical questions across three independent runs per case, generating 720 binary outputs.
Am J Pharm Educ
September 2025
Department of Pharmacotherapy, University of Utah College of Pharmacy, 30 South 2000 East, Salt Lake City, Utah 84112. Electronic address:
The accelerating adoption of artificial intelligence (AI), particularly large language models (LLMs) such as ChatGPT, has raised critical questions about the role of pharmacists and the potential for AI to substitute for human expertise in pharmaceutical care. Grounded in Porter's Five Forces framework-specifically the threat of substitutes-this commentary explores whether AI can adequately fulfill the complex and relational functions of pharmacists in delivering care to patients. Drawing from foundational definitions of pharmaceutical care and economic theories of substitution, the paper examines both historical and emerging competitors to pharmacist-provided services, including physicians, nurses, and now AI-powered tools.
View Article and Find Full Text PDF