CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Seowoo Lee , Jiwon Youn , Hyungjin Kim , Mansu Kim , Soon Ho Yoon

Eur Radiol

Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.

Materials And Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting.

Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports.

Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts.

Key Points: Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166004	PMC
http://dx.doi.org/10.1007/s00330-024-11339-6	DOI Listing

Publication Analysis

Top Keywords

large language

multimodal large

major pathological

pathological findings

language model

chest x-ray

test set

model

interpreting chest

x-ray images

Similar Publications

Household Food Production and Dietary Diversity in a Remote, Former Socialist Society: Panel Data Evidence from Tajikistan.

Food Nutr Bull

September 2025

Tajik Academy of Agricultural Sciences, Dushanbe, Tajikistan.

Hiroyuki Takeshima , Isabel Brigitte Lambrecht , Kamiljon Akramov , Tanzila Ergasheva

BackgroundDespite a growing interest in household-level agriculture-nutrition linkage, evidence remains thin in countries like Tajikistan, one of the poorest former socialist countries where food crop production decisions by individual farm households had been significantly regulated by the government until recently.ObjectivesWe narrow this knowledge gap by examining the linkages between households' food production practice as well as their productivity performances and dietary diversity scores (DDS) of both the household and individual women in Tajikistan.MethodsWe use a panel sample of households and individual women of reproductive ages in the Khatlon province of Tajikistan, the poorest province and a major agricultural region of the country.

View Article and Find Full Text PDF

Similar Publications

Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.

Brief Bioinform

August 2025

School of Information and Artificial Intelligence, Anhui Agricultural University, 130 Changjiang Road, Shushan District, Hefei, Anhui 230036, China.

Xiang Li , Wei Peng , Xiaolei Zhu

Protein-nucleic acid binding sites play a crucial role in biological processes such as gene expression, signal transduction, replication, and transcription. In recent years, with the development of artificial intelligence, protein language models, graph neural networks, and transformer architectures have been adopted to develop both structure-based and sequence-based predictive models. Structure-based methods benefit from the spatial relationship between residues and have shown promising performance.

View Article and Find Full Text PDF

Similar Publications

Can Large Language Models Guide Aortic Stenosis Management? A Comparative Analysis of ChatGPT and Gemini AI.

Turk Kardiyol Dern Ars

September 2025

Department of Cardiology, Muğla Sıtkı Koçman University, School of Medicine, Muğla, Türkiye.

Ali Sezgin , Veysel Ozan Tanık , Murat Akdoğan , Yusuf Bozkurt Şahin , Kürşat Akbuğa

Objective: Management of aortic stenosis (AS) requires integrating complex clinical, imaging, and risk stratification data. Large language models (LLMs) such as ChatGPT and Gemini AI have shown promise in healthcare, but their performance in valvular heart disease, particularly AS, has not been thoroughly assessed. This study systematically compared ChatGPT and Gemini AI in addressing guideline-based and clinical scenario questions related to AS.

View Article and Find Full Text PDF

Similar Publications

Better growth outcomes in GH-deficient children treated younger than 2 years of age.

Endocr Connect

September 2025

Department of Paediatric Endocrinology, Alder Hey Children's NHS Foundation Trust, Liverpool, UK.

Tilman Robert Rohrer , Primož Kotnik , Bradley S Miller , Nicky Kelepouris , Anne Helene Olsen

Background: Limited data are available on the growth response to growth hormone (GH) treatment of very young children with GH deficiency (GHD). In the present analysis, we compared clinical outcomes after GH treatment in children with GHD aged <2 and ≥2 years at the start of GH treatment.

Methods: We analysed pooled data from two observational studies of paediatric patients who received Norditropin® treatment: NordiNet® IOS (NCT00960128) and the ANSWER Program (NCT01009905).

View Article and Find Full Text PDF

Similar Publications

GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models.

J Pharm Anal

August 2025

Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.

Zihan Zhou , Yang Yu , Chengji Yang , Leyan Cao , Shaoying Zhang

Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels.

View Article and Find Full Text PDF

Similar Publications