Survey and improvement strategies for gene prioritization with large language models.

Matthew B Neeley , Guantong Qi , Guanchu Wang , Ruixiang Tang , Dongxue Mao , Chaozhong Liu , Sasidhar Pasupuleti , Bo Yuan , Fan Xia , Pengfei Liu , Zhandong Liu , Xia Hu

Bioinform Adv

Department of Computer Science, Rice University, Houston, TX, 77005, United States.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: Rare diseases remain difficult to diagnose due to limited patient data and genetic diversity, with many cases remaining undiagnosed despite advances in variant prioritization tools. While large language models have shown promise in medical applications, their optimal application for trustworthy and accurate gene prioritization downstream of modern prioritization tools has not been systematically evaluated.

Results: We benchmarked various language models for gene prioritization using multi-agent and Human Phenotype Ontology classification approaches to categorize patient cases by phenotype-based solvability levels. To address language model limitations in ranking large gene sets, we implemented a divide-and-conquer strategy with mini-batching and token limiting for improved efficiency. GPT-4 outperformed other language models across all patient datasets, demonstrating superior accuracy in ranking causal genes. Multi-agent and Human Phenotype Ontology classification approaches effectively distinguished between confidently-solved and challenging cases. However, we observed bias toward well-studied genes and input order sensitivity as notable language model limitations. Our divide-and-conquer strategy enhanced accuracy, overcoming positional and gene frequency biases in literature. This framework optimized the overall process for identifying disease-causal genes compared to baseline evaluation, better enabling targeted diagnostic and therapeutic interventions and streamlining diagnosis of rare genetic disorders.

Availability And Implementation: Software and additional material is available at: https://github.com/LiuzLab/GPT-Diagnosis.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263109	PMC
http://dx.doi.org/10.1093/bioadv/vbaf148	DOI Listing

Publication Analysis

Top Keywords

language models

gene prioritization

large language

prioritization tools

multi-agent human

human phenotype

phenotype ontology

ontology classification

classification approaches

language model

A PHP Error was encountered