On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models.

Majid Afshar , Yanjun Gao , Deepak Gupta , Emma Croxford , Dina Demner-Fushman

J Biomed Inform

National Library of Medicine, NIH, HHS, 8600 Rockville Pike, Bethesda, 20894, MD, USA.

Published: September 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs.

Methods: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation.

Results: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments.

Conclusion: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11402555	PMC
http://dx.doi.org/10.1016/j.jbi.2024.104707	DOI Listing

Publication Analysis

Top Keywords

umls knowledge

medical knowledge

umls paths

evaluation metrics

knowledge

umls

diagnosis generation

large language

language models

llms

Similar Publications

The Data Distillery: A Graph Framework for Semantic Integration and Querying of Biomedical Data.

bioRxiv

September 2025

Taha Mohseni Ahooyi , Benjamin Stear , J Alan Simmons , Vincent T Metzger , Praveen Kumar

The Data Distillery Knowledge Graph (DDKG) is a framework for semantic integration and querying of biomedical data across domains. Built for the NIH Common Fund Data Ecosystem, it supports translational research by linking clinical and experimental datasets in a unified graph model. Clinical standards such as ICD-10, SNOMED, and DrugBank are integrated through UMLS, while genomics and basic science data are structured using ontologies and standards such as HPO, GENCODE, Ensembl, STRING, and ClinVar.

View Article and Find Full Text PDF

Similar Publications

Infusing clinical knowledge into language models by subword optimisation and embedding initialisation.

Comput Biol Med

September 2025

University College London, Institute of Health Informatics, 222 Euston Rd., London, NW1 2DA, UK; School of Health and Wellbeing, University of Glasgow, UK. Electronic address:

Abul Hasan , Jinge Wu , Quang Ngoc Nguyen , Salomé Andres , Imane Guellil

Objective: This study introduces a novel tokenisation methodology, K-Tokeniser, to infuse clinical knowledge into language models for clinical text processing.

Methods: Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At training or inference stage, sentence level localised context will be utilised for choosing the optimal global token representation to realise the semantic-based tokenisation.

View Article and Find Full Text PDF

Similar Publications

LLM-Integrated Normalization and Knowledge for FHIR (LINK-FHIR).

Stud Health Technol Inform

August 2025

Department of Biomedical Engineering and Informatics.

Zhen Hou , Ming Jiang , Hao Liu , Yan Zhuang

Current approaches lack efficient methods to convert diverse healthcare data formats into standardized Fast Healthcare Interoperability Resources (FHIR). LINK-FHIR is a novel system for converting diverse Electronic Health Records into FHIR-compliant resources. The system leverages fine-tuned Large Language Models through a unified pipeline to efficiently process unstructured clinical notes, semi-structured lab reports, and structured tables.

View Article and Find Full Text PDF

Similar Publications

Leveraging Large Language Models for Personalized Parkinson's Disease Treatment.

IEEE J Biomed Health Inform

August 2025

Rongqian Zhang , Guanwen Xie , Jie Ying , Zhongsheng Hua

Parkinson's Disease (PD) treatment is challenging due to symptom heterogeneity and the lack of a definitive cure. Lifelong medication requires personalized treatment plans developed by physicians, but such approaches are constrained by high costs and limited physician capacity. Although deep learning (DL) methods have been explored, they lack interpretability and are restricted to numerical data inputs.

View Article and Find Full Text PDF

Similar Publications

Relation prediction in knowledge graphs: A self-organizing neural network approach.

Neural Netw

October 2025

School of Computing and Information Systems, Singapore Management University, 80 Stamford Road, 178902, Singapore. Electronic address:

Budhitama Subagdja , D Shanthoshigaa , Ah-Hwee Tan

Knowledge graphs (KGs) in specialized domains frequently suffer from incomplete information. While current relation prediction methods for KG completion typically rely on neural network-based representation learning, we present KG2ART-a novel self-organizing neural network that employs a fundamentally different approach. KG2ART performs parallel inference over the graph structure through bidirectional interactions between bottom-up activations and top-down pattern matching to conduct relation prediction without representation learning.

View Article and Find Full Text PDF

Similar Publications