98%
921
2 minutes
20
Background And Study Aims: Current general-purpose artificial intelligence (AI) large language models (LLMs) demonstrate limited efficacy in clinical medicine, often constrained to question-answering, documentation, and literature summarization roles. We developed GastroGPT, a proof-of-concept specialty-specific, multi-task, clinical LLM, and evaluated its performance against leading general-purpose LLMs across key gastroenterology tasks and diverse case scenarios.
Methods: In this structured analysis, GastroGPT was compared with three state-of-the-art general-purpose LLMs (LLM-A: GPT-4, LLM-B: Bard, LLM-C: Claude). Models were assessed on seven clinical tasks and overall performance across 10 simulated gastroenterology cases varying in complexity, frequency, and patient demographics. Standardized prompts facilitated structured comparisons. A blinded expert panel rated model outputs per task on a 10-point Likert scale, judging clinical utility. Comprehensive statistical analyses were conducted.
Results: A total of 2,240 expert ratings were obtained. GastroGPT achieved significantly higher mean overall scores (8.1 ± 1.8) compared with GPT-4 (5.2 ± 3.0), Bard (5.7 ± 3.3), and Claude (7.0 ± 2.7) (all < 0.001). It outperformed comparators in six of seven tasks ( < 0.05), except follow-up planning. GastroGPT demonstrated superior score consistency (variance 34.95) versus general models (97.4-260.35) ( < 0.001). Its performance remained consistent across case complexities and frequencies, unlike the comparators ( < 0.001). Multivariate analysis revealed that model type significantly predicted performance ( < 0.001).
Conclusions: This study pioneered development and comparison of a specialty-specific, clinically-oriented AI model to general-purpose LLMs. GastroGPT demonstrated superior utility overall and on key gastroenterology tasks, highlighting the potential for tailored, task-focused AI models in medicine.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12371664 | PMC |
http://dx.doi.org/10.1055/a-2637-2163 | DOI Listing |
Front Med (Lausanne)
August 2025
Department of Oncology, Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Background: This study evaluates how AI enhances EHR efficiency by comparing a lung cancer-specific LLM with general-purpose models (DeepSeek, GPT-3.5) and clinicians across expertise levels, assessing accuracy and completeness in complex lung cancer pathology documentation and task load changes pre-/post-AI implementation.
Methods: This study analyzed 300 lung cancer cases (Shanghai Chest Hospital) and 60 TCGA cases, split into training/validation/test sets.
Large Language Models (LLMs), AI agents and co-scientists promise to accelerate scientific discovery across fields ranging from chemistry to biology. Bioinformatics- the analysis of DNA, RNA and protein sequences plays a crucial role in biological research and is especially amenable to AI-driven automation given its computational nature. Here, we assess the bioinformatics capabilities of three popular general-purpose LLMs on a set of tasks covering basic analytical questions that include code writing and multi-step reasoning in the domain.
View Article and Find Full Text PDFFront Robot AI
August 2025
Information Technologies Institute, The Centre for Research and Technology Hellas, Thessaloniki, Greece.
Agentic AI refers to autonomous systems that can perceive their environment, make decisions, and take actions to achieve goals with minimal or no human intervention. Recent advances in Large Language Models (LLMs) have opened new pathways to imbue robots with such "agentic" behaviors by leveraging the LLMs' vast knowledge and reasoning capabilities for planning and control. This survey provides the first comprehensive exploration of LLM-based robotic systems integration into agentic behaviors that have been validated in real-world applications.
View Article and Find Full Text PDFComput Biol Med
August 2025
School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, Australia.
Despite rapid healthcare digitization, extracting information from unstructured electronic health records (EHRs), such as nursing notes, remains challenging due to inconsistencies and ambiguities in clinical documentation. Generative large language models (LLMs) have emerged as promising tools for automating information extraction (IE); however, their application in real-world clinical settings, such as residential aged care (RAC), is limited by critical gaps. Prior studies have often focused on structured EHR data and conventional evaluation metrics such as accuracy and F1 score, overlooking critical aspects like robustness, fairness, bias, and contextual relevance, particularly in unstructured clinical narratives.
View Article and Find Full Text PDFBiomedicines
July 2025
Department of Integrative Translational Sciences, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA.
The RTK-RAS signaling cascade is a central axis in colorectal cancer (CRC) pathogenesis, governing cellular proliferation, survival, and therapeutic resistance. Somatic alterations in key pathway genes-including KRAS, NRAS, BRAF, and EGFR-are pivotal to clinical decision-making in precision oncology. However, the integration of these genomic events with clinical and demographic data remains hindered by fragmented resources and a lack of accessible analytical frameworks.
View Article and Find Full Text PDF