Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In a first-of-its-kind study, we assessed the capabilities of large language models (LLMs) in making complex decisions in haematopoietic stem cell transplantation. The evaluation was conducted not only for Generative Pre-trained Transformer 4 (GPT-4) but also conducted on other artificial intelligence models: PaLm 2 and Llama-2. Using detailed haematological histories that include both clinical, molecular and donor data, we conducted a triple-blind survey to compare LLMs to haematology residents. We found that residents significantly outperformed LLMs (p = 0.02), particularly in transplant eligibility assessment (p = 0.01). Our triple-blind methodology aimed to mitigate potential biases in evaluating LLMs and revealed both their promise and limitations in deciphering complex haematological clinical scenarios.

Download full-text PDF

Source
http://dx.doi.org/10.1111/bjh.19200DOI Listing

Publication Analysis

Top Keywords

large language
8
language models
8
haematopoietic stem
8
stem cell
8
cell transplantation
8
evaluating performance
4
performance large
4
models haematopoietic
4
transplantation decision-making
4
decision-making first-of-its-kind
4

Similar Publications

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

JMIR Med Inform

September 2025

Departments of Radiology, The Third Affiliated Hospital, Sun Yat-Sen University, 600 Tianhe Road, Guangzhou, Guangdong, 510630, China, 86 18922109279, 86 20852523108.

Background: Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.

Objective: To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.

View Article and Find Full Text PDF

Background: With the development of artificial intelligence, obtaining patient-centered medical information through large language models (LLMs) is crucial for patient education. However, existing digital resources in online health care have heterogeneous quality, and the reliability and readability of content generated by various AI models need to be evaluated to meet the needs of patients with different levels of cultural literacy.

Objective: This study aims to compare the accuracy and readability of different LLMs in providing medical information related to gynecomastia, and explore the most promising science education tools in practical clinical applications.

View Article and Find Full Text PDF

This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.

View Article and Find Full Text PDF

Artificial intelligence (AI), particularly large language models (LLMs), offers the potential to augment clinical decision-making, including in palliative care pharmacy, where personalized treatment and assessments are important. Despite the growing interest in AI, its role in clinical reasoning within specialized fields such as palliative care remains uncertain. This study examines the performance of four commercial-grade LLMs on a Script Concordance Test (SCT) designed for pharmacy students in a pain and palliative care elective, comparing AI outputs with human learners' performance at baseline.

View Article and Find Full Text PDF

Purpose: Prior studies of vocal auditory-motor control in people with hyperfunctional voice disorders (HVDs) have found evidence of unusually large responses to auditory feedback perturbations of fundamental frequency (0) and more variable voice onset times in unperturbed speech. However, it is unknown whether people with HVDs perform similarly to people with typical voices when asked to make small changes in vocal parameters in volitional tasks. The purpose of this study was to compare performance on minimal movement tasks for 0 and intensity in people with and without HVDs.

View Article and Find Full Text PDF