Generative Pre-trained Transformer 4 analysis of cardiovascular magnetic resonance reports in suspected myocarditis: A multicenter study.

Kenan Kaya , Carsten Gietzen , Robert Hahnfeldt , Maher Zoubi , Tilman Emrich , Moritz C Halfmann , Malte Maria Sieren , Yannic Elser , Patrick Krumm , Jan M Brendel , Konstantin Nikolaou , Nina Haag , Jan Borggrefe , Ricarda von Krüchten , Katharina Müller-Peltzer , Constantin Ehrengut , Timm Denecke , Andreas Hagendorff , Lukas Goertz , Roman J Gertz

J Cardiovasc Magn Reson

Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Diagnosing myocarditis relies on multimodal data, including cardiovascular magnetic resonance (CMR), clinical symptoms, and blood values. The correct interpretation and integration of CMR findings require radiological expertise and knowledge. We aimed to investigate the performance of Generative Pre-trained Transformer 4 (GPT-4), a large language model, for report-based medical decision-making in the context of cardiac MRI for suspected myocarditis.

Methods: This retrospective study includes CMR reports from 396 patients with suspected myocarditis and eight centers, respectively. CMR reports and patient data including blood values, age, and further clinical information were provided to GPT-4 and radiologists with 1 (resident 1), 2 (resident 2), and 4 years (resident 3) of experience in CMR and knowledge of the 2018 Lake Louise Criteria. The final impression of the report regarding the radiological assessment of whether myocarditis is present or not was not provided. The performance of Generative pre-trained transformer 4 (GPT-4) and the human readers were compared to a consensus reading (two board-certified radiologists with 8 and 10 years of experience in CMR). Sensitivity, specificity, and accuracy were calculated.

Results: GPT-4 yielded an accuracy of 83%, sensitivity of 90%, and specificity of 78%, which was comparable to the physician with 1 year of experience (R1: 86%, 90%, 84%, p = 0.14) and lower than that of more experienced physicians (R2: 89%, 86%, 91%, p = 0.007 and R3: 91%, 85%, 96%, p < 0.001). GPT-4 and human readers showed a higher diagnostic performance when results from T1- and T2-mapping sequences were part of the reports, for residents 1 and 3 with statistical significance (p = 0.004 and p = 0.02, respectively).

Conclusion: GPT-4 yielded good accuracy for diagnosing myocarditis based on CMR reports in a large dataset from multiple centers and therefore holds the potential to serve as a diagnostic decision-supporting tool in this capacity, particularly for less experienced physicians. Further studies are required to explore the full potential and elucidate educational aspects of the integration of large language models in medical decision-making.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11414660	PMC
http://dx.doi.org/10.1016/j.jocmr.2024.101068	DOI Listing

Publication Analysis

Top Keywords

generative pre-trained

pre-trained transformer

cardiovascular magnetic

magnetic resonance

suspected myocarditis

data including

blood values

performance generative

transformer gpt-4

cmr reports

Similar Publications

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

JMIR Med Inform

September 2025

Departments of Radiology, The Third Affiliated Hospital, Sun Yat-Sen University, 600 Tianhe Road, Guangzhou, Guangdong, 510630, China, 86 18922109279, 86 20852523108.

Youmei Chen , Mengshi Dong , Jie Sun , Zhanao Meng , Yiqing Yang

Background: Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.

Objective: To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.

View Article and Find Full Text PDF

Similar Publications

TransFactor-Prediction of pro-viral SARS-CoV-2 host factors using a protein language model.

Bioinformatics

September 2025

Computational Health Center, Helmholtz Center Munich, Neuherberg, 85764, Germany.

Yang An , Valter Bergant , Samuele Firmani , Corinna Grünke , Batiste Bonnal

Motivation: Recent pandemics have revealed significant gaps in our understanding of viral pathogenesis, exposing an urgent need for methods to identify and prioritize key host proteins (host factors) as potential targets for antiviral treatments. De novo generation of experimental datasets is limited by their heterogeneity, and for looming future pandemics, may not be feasible due to limitations of experimental approaches.

Results: Here we present TransFactor, a computational framework for predicting and prioritizing candidate host factors using only protein sequence data.

View Article and Find Full Text PDF

Similar Publications

ChatGPT-4o is Not a Reliable Study Source for Orthopaedic Surgery Residents.

JB JS Open Access

September 2025

Department of Orthopaedic Surgery, St. Luke's University Health Network, Bethlehem, Pennsylvania.

Neil Jain , Caleb Gottlich , John Fisher , Travis Winston , Kristofer Matullo

Background: The use of artificial intelligence platforms by medical residents as an educational resource is increasing. Within orthopaedic surgery, older Chat Generative Pre-trained Transformer (ChatGPT) models performed worse than resident physicians on practice examinations and rarely answered questions with images correctly. The newer ChatGPT-4o was designed to improve these deficiencies but has not been evaluated.

View Article and Find Full Text PDF

Similar Publications

Deep-learning based morphological segmentation of canine diffuse large B-cell lymphoma.

Front Vet Sci

August 2025

Pathobiology and Population Science, Royal Veterinary College, Hatfield, United Kingdom.

Kenneth Ancheta , Androniki Psifidi , Andrew D Yale , Sophie Le Calvez , Jonathan Williams

Diffuse large B-cell lymphoma is the most common type of non-Hodgkin lymphoma (NHL) in humans, accounting for about 30-40% of NHL cases worldwide. Canine diffuse large B-cell lymphoma (cDLBCL) is the most common lymphoma subtype in dogs and demonstrates an aggressive biologic behaviour. For tissue biopsies, current confirmatory diagnostic approaches for enlarged lymph nodes rely on expert histopathological assessment, which is time-consuming and requires specialist expertise.

View Article and Find Full Text PDF

Similar Publications

Advancing Prognostics in Oncology: Developing a Machine Learning Model for Predicting 2-Year and 5-Year Survival Rates in Patients with Undifferentiated Pleomorphic Sarcoma.

Ann Surg Oncol

September 2025

Orthopaedic Oncology Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA.

Andrew G Girgis , Bishoy M Galoaa , Marcos R Gonzalez , Santiago A Lozano-Calderon

Background: Undifferentiated pleomorphic sarcoma (UPS) is a prevalent soft tissue sarcoma subtype associated with poor prognosis. Current prognostic tools lack the ability to incorporate personalized data for predicting survival. Machine learning (ML) offers a potential solution to enhance survival prediction accuracy.

View Article and Find Full Text PDF

Similar Publications