Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.

AMIA Jt Summits Transl Sci Proc

Department of Population Health Sciences, Weill Cornell Medicine, New York.

Published: June 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4-o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential offine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150749PMC

Publication Analysis

Top Keywords

synthetic labels
16
llama 31-8b
12
disease detection
8
lightweight llm
8
medical domain
8
micro score
8
labels
7
enhancing disease
4
detection radiology
4
radiology reports
4

Similar Publications

PERC: a suite of software tools for the curation of cryoEM data with application to simulation, modeling and machine learning.

Acta Crystallogr F Struct Biol Commun

October 2025

Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.

Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated data sets. Being able to easily access and utilize these is crucial to allow researchers to make optimal use of their research effort.

View Article and Find Full Text PDF

Chromatin dynamics play a crucial role in cellular differentiation, yet tools for studying global chromatin mobility in living cells remain limited. Here, a novel probe is developeded for the metabolic labeling of chromatin and tracking its mobility during neural differentiation. The labeling system utilizes a newly developed silicon rhodamine-conjugated deoxycytidine triphosphate (dCTP).

View Article and Find Full Text PDF

Microbial spoilage and oxidation are significant causes of food deterioration, contributing to food waste of up to 30%. To mitigate these losses, active food packaging is an effective solution. Considering the excellent properties of nanofibers produced by electrospinning, integrating active food packaging functionality with nanofiber technology offers an ideal approach enhancing preservation.

View Article and Find Full Text PDF

Differentiating the processing degree of animal material by mass spectrometry: A feasibility study on porcine and bovine blood-derived feed ingredients.

Food Res Int

November 2025

German Federal Institute for Risk Assessment (BfR), Department Food Safety, National Reference Laboratory for Animal Protein in Feed, Max-Dohrn-Str. 8-10, 10589 Berlin, Germany. Electronic address:

Processing food and feed sets off a variety of reactions (Maillard, (lipid) oxidation), which may be traced by covalent changes to e.g. proteins.

View Article and Find Full Text PDF

The retinol isotope dilution (RID) test is the most sensitive method to assess vitamin A status by estimating total liver reserves, considered the reference standard. For gas chromatography-combustion-isotope ratio mass spectrometry detection, C is added to the retinol moiety. The synthetic procedure for C-retinyl acetate begins with the naturally occurring β-ionone.

View Article and Find Full Text PDF