Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.

Marcel C Langenbach , Borek Foldyna , Ibrahim Hadzic , Isabel L Langenbach , Vineet K Raghu , Michael T Lu , Tomas G Neilan , Julius C Heemelaar

Eur Radiol

Cardiovascular Imaging Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: Medical reports, governed by HIPAA regulations, contain personal health information (PHI), restricting secondary data use. Utilizing natural language processing (NLP) and large language models (LLM), we sought to employ publicly available methods to automatically anonymize PHI in free-text radiology reports.

Materials And Methods: We compared two publicly available rule-based NLP models (spaCy; NLP, accuracy-optimized; NLP, speed-optimized; iteratively improved on 400 free-text CT-reports (test set)) and one offline LLM approach (LLM-model, LLaMa-2, Meta-AI) for PHI-anonymization. The three models were tested on 100 randomly selected chest CT reports. Two investigators assessed the anonymization of occurring PHI entities and whether clinical information was removed. Subsequently, precision, recall, and F1 scores were calculated.

Results: NLP and NLP successfully removed all instances of dates (n = 333), medical record numbers (MRN) (n = 6), and accession numbers (ACC) (n = 92). The LLM model removed all MRNs, 96% of ACCs, and 32% of dates. NLP was most consistent with a perfect F1-score of 1.00, followed by NLP with lower precision (0.86) and F1-score (0.92) for dates. The LLM model had perfect precision for MRNs, ACCs, and dates but the lowest recall for ACC (0.96) and dates (0.52), corresponding F1 scores of 0.98 and 0.68, respectively. Names were removed completely or majorly (i.e., one first or family name non-anonymized) in 100% (NLP), 72% (NLP), and 90% (LLM-model). Importantly, NLP and NLP did not remove medical information, while the LLM model did in 10% (n = 10).

Conclusion: Pre-trained NLP models can effectively anonymize free-text radiology reports, while anonymization with the LLM model is more prone to deleting medical information.

Key Points: Question This study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information. Findings Pre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information. Clinical relevance Fast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s00330-024-11148-x	DOI Listing

Publication Analysis

Top Keywords

radiology reports

llm model

nlp

nlp models

automated anonymization

natural language

language processing

large language

language models

llm

A PHP Error was encountered