98%
921
2 minutes
20
Background: Generative artificial intelligence (AI) systems are increasingly deployed in clinical pharmacy; yet, systematic evaluation of their efficacy, limitations, and risks across diverse practice scenarios remains limited.
Objective: This study aims to quantitatively evaluate and compare the performance of 8 mainstream generative AI systems across 4 core clinical pharmacy scenarios-medication consultation, medication education, prescription review, and case analysis with pharmaceutical care-using a multidimensional framework.
Methods: Forty-eight clinically validated questions were selected via stratified sampling from real-world sources (eg, hospital consultations, clinical case banks, and national pharmacist training databases). Three researchers simultaneously tested 8 different generative AI systems (ERNIE Bot, Doubao, Kimi, Qwen, GPT-4o, Gemini-1.5-Pro, Claude-3.5-Sonnet, and DeepSeek-R1) using standardized prompts within a single day (February 20, 2025). A double-blind scoring design was used, with 6 experienced clinical pharmacists (≥5 years experience) evaluating the AI responses across 6 dimensions: accuracy, rigor, applicability, logical coherence, conciseness, and universality, scored 0-10 per predefined criteria (eg, -3 for inaccuracy and -2 for incomplete rigor). Statistical analysis used one-way ANOVA with Tukey Honestly Significant Difference (HSD) post hoc testing and intraclass correlation coefficients (ICC) for interrater reliability (2-way random model). Qualitative thematic analysis identified recurrent errors and limitations.
Results: DeepSeek-R1 (DeepSeek) achieved the highest overall performance (mean composite score: medication consultation 9.4, SD 1.0; case analysis 9.3, SD 1.0), significantly outperforming others in complex tasks (P<.05). Critical limitations were observed across models, including high-risk decision errors-75% omitted critical contraindications (eg, ethambutol in optic neuritis) and a lack of localization-90% erroneously recommended macrolides for drug-resistant Mycoplasma pneumoniae (China's high-resistance setting), while only DeepSeek-R1 aligned with updated American Academy of Pediatrics (AAP) guidelines for pediatric doxycycline. Complex reasoning deficits: only Claude-3.5-Sonnet detected a gender-diagnosis contradiction (prostatic hyperplasia in female); no model identified diazepam's 7-day prescription limit. Interrater consistency was lowest for conciseness in case analysis (ICC=0.70), reflecting evaluator disagreement on complex outputs. ERNIE Bot (Baidu) consistently underperformed (case analysis: 6.8, SD 1.5; P<.001 vs DeepSeek-R1).
Conclusions: While generative AI shows promise as a pharmacist assistance tool, significant limitations-including high-risk errors (eg, contraindication omissions), inadequate localization, and complex reasoning gaps-preclude autonomous clinical decision-making. Performance stratification highlights DeepSeek-R1's current advantage, but all systems require optimization in dynamic knowledge updating, complex scenario reasoning, and output interpretability. Future deployment must prioritize human oversight (human-AI co-review), ethical safeguards, and continuous evaluation frameworks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288765 | PMC |
http://dx.doi.org/10.2196/76128 | DOI Listing |
Jpn J Clin Oncol
September 2025
Department of Hematology and Oncology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan.
Background: Amrubicin monotherapy has been used in Japan for patients with refractory, relapsed, small cell lung cancer (SCLC). However, the clinical guidelines do not specify a recommended initial dose for elderly patients. This retrospective study aimed to explore the appropriate initial dose of amrubicin for elderly patients with refractory, relapsed SCLC.
View Article and Find Full Text PDFCrit Rev Anal Chem
September 2025
Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education & Research, Mysore, India.
The miniaturization of separation platforms marks a transformative shift in analytical science, merging microfabrication, automation, and intelligent data integration to meet rising demands for portability, sustainability, and precision. This review critically synthesizes recent technological advances reshaping the field-from microinjection and preconcentration modules to compact, high-sensitivity detection systems including ultraviolet-visible (UV/Vis), fluorescence (FL), electrochemical detection (ECD), and mass spectrometry (MS). The integration of microcontrollers, AI-enhanced calibration routines, and IoT-enabled feedback loops has led to the rise of self-regulating analytical devices capable of real-time decision-making and autonomous operation.
View Article and Find Full Text PDFMol Pharm
September 2025
Johnson & Johnson, Translational PK/PD & Investigational Toxicology, Spring House, Pennsylvania 19002, United States.
Human intestinal permeability is a key determinant of the oral fraction absorbed () of active pharmaceutical ingredients (APIs). This study evaluated the ability of an in-house canine Mdr1 (cMdr1) knockout (KO) Madin-Darby Canine Kidney (MDCK) cell line to correlate apparent permeability () with human small intestinal permeability (). values of 16 reference compounds with high, medium, or low permeabilities were measured in the in-house cMdr1 KO MDCK protocol under pH gradient (6.
View Article and Find Full Text PDFJ Crohns Colitis
September 2025
Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
Background & Aims: Pregnancy can be a complex and risk-filled event for women with inflammatory bowel disease (IBD). High-quality studies in this population are lacking, with limited data on medications approved to treat IBD during pregnancy. For patients, limited knowledge surrounding pregnancy impacts pregnancy rates, medication adherence, and outcomes.
View Article and Find Full Text PDFClin Pediatr (Phila)
September 2025
College of Medicine, King Saud University, Riyadh, Saudi Arabia.
To optimize the deployment of Generative Artificial Intelligence in health care, it's essential for health care professionals (HCPs) to understand these technologies' capabilities and constraints. This study explores HCPs' initial impressions and experiences using ChatGPT, a Generative Pre-trained Transformer, in Pediatric Critical Care Units (PICUs). By conducting focus groups with a diverse set of HCPs, we aimed to assess their awareness, utilization, perceived benefits, and concerns about incorporating ChatGPT into their PICUs.
View Article and Find Full Text PDF