98%
921
2 minutes
20
Despite rapid healthcare digitization, extracting information from unstructured electronic health records (EHRs), such as nursing notes, remains challenging due to inconsistencies and ambiguities in clinical documentation. Generative large language models (LLMs) have emerged as promising tools for automating information extraction (IE); however, their application in real-world clinical settings, such as residential aged care (RAC), is limited by critical gaps. Prior studies have often focused on structured EHR data and conventional evaluation metrics such as accuracy and F1 score, overlooking critical aspects like robustness, fairness, bias, and contextual relevance, particularly in unstructured clinical narratives. To address these gaps, this study develops a holistic evaluation framework for clinical IE from free-text nursing notes in the Australian RAC. We systematically evaluate 17 LLMs, including general-purpose and healthcare-specific variants (e.g., LLaMA, Mistral, Gemini, T5) across retrieval-augmented generation (RAG) frameworks and few-shot learning configurations (one-shot, three-shot, four-shot, five-shot). The evaluation focuses on two clinical IE tasks: named entity recognition (NER) and summarization. Results reveal LLaMA 3.1 achieved 88.58 % accuracy, 87.43 % F1 score in NER, 88.18 % F1 score, and 83.15 % relevance in summarization. However, robustness remained low (4.00 % for NER, 4.31 % for summarization) despite excellent fairness (99.9 %) and minimal bias (0.11 %) in both tasks. Further, healthcare-specific LLMs slightly outperform general models, and RAG-based approaches (LangChain, LlamaIndex) yield superior results. Task-specific optimal few-shot settings emerged: three-shot for NER and five-shot for summarization. This study provides a foundation for safely integrating generative AI into clinical decision support.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.compbiomed.2025.111013 | DOI Listing |
PLoS One
September 2025
Centre for Experimental Pathogen Host Research, School of Medicine, University College Dublin, Dublin, Ireland.
Background: Acute viral respiratory infections (AVRIs) rank among the most common causes of hospitalisation worldwide, imposing significant healthcare burdens and driving the development of pharmacological treatments. However, inconsistent outcome reporting across clinical trials limits evidence synthesis and its translation into clinical practice. A core outcome set (COS) for pharmacological treatments in hospitalised adults with AVRIs is essential to standardise trial outcomes and improve research comparability.
View Article and Find Full Text PDFIEEE Comput Graph Appl
September 2025
Autonomous agents powered by Large Language Models are transforming AI, creating an imperative for the visualization area. However, our field's focus on a human in the sensemaking loop raises critical questions about autonomy, delegation, and coordination for such agentic visualization that preserve human agency while amplifying analytical capabilities. This paper addresses these questions by reinterpreting existing visualization systems with semi-automated or fully automatic AI components through an agentic lens.
View Article and Find Full Text PDFDrug Saf
September 2025
The MITRE Corporation, 202 Burlington Rd, Bedford, MA, 01730, USA.
Acta Neurochir (Wien)
September 2025
Department of Neurosurgery, Istinye University, Istanbul, Turkey.
Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.
View Article and Find Full Text PDF