98%
921
2 minutes
20
Background: Feature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.
Methods: In this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene's biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.
Results: Of the four LLMs evaluated, OpenAI's GPT-4 and Anthropic's Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.
Conclusions: Taken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580627 | PMC |
http://dx.doi.org/10.1186/s12967-023-04576-8 | DOI Listing |
Eur J Gastroenterol Hepatol
September 2025
Department of Gastroenterology, First Affiliated Hospital of Shantou University Medical College, Shantou.
Background: Crohn's disease (CD) and rheumatoid arthritis (RA) are autoimmune diseases. CD is known to be closely associated with RA. However, the mechanisms underlying these relationships remain unclear.
View Article and Find Full Text PDFClin Appl Thromb Hemost
September 2025
Pediatric Hematology Laboratory, Division of Hematology/Oncology, Department of Pediatrics, The Seventh Affiliated Hospital of Sun Yat-Sen University, Shenzhen, Guangdong, China.
Hemophilia, an X-linked monogenic disorder, arises from mutations in the or genes, which encode clotting factor VIII (FVIII) or clotting factor IX (FIX), respectively. As a prominent hereditary coagulation disorder, hemophilia is clinically manifested by spontaneous hemorrhagic episodes. Severe cases may progress to complications such as stroke and arthropathy, significantly compromising patients' quality of life.
View Article and Find Full Text PDFBiotechnol Lett
September 2025
Unit of Microbiology and Immunology, Vector Control Research Centre, Indian Council of Medical Research, Department of Health Research, Ministry of Health and Family Welfare, Puducherry, 605006, India.
Effective mosquito control is essential for reducing the transmission of vector-borne diseases. This study focuses on the comprehensive characterization of mosquitocidal toxins produced by Bacillus thuringiensis serovar israelensis (Bti) VCRC B646 and the associated insecticidal genes. The bacterium was cultured, and the spore-crystal complex was purified to identify the mosquitocidal proteins.
View Article and Find Full Text PDFFunct Integr Genomics
September 2025
Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, 350005, China.
Keloid scarring and Metabolic Syndrome (MS) are distinct conditions marked by chronic inflammation and tissue dysregulation, suggesting shared pathogenic mechanisms. Identifying common regulatory genes could unveil novel therapeutic targets. Methods.
View Article and Find Full Text PDFMar Biotechnol (NY)
September 2025
Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, Sanya, China.
Epinephelus tukula is an economically important aquaculture animal, and a major parent in grouper crossbreeding. To better preserve and exploit E. tukula germplasm resources, a core collection (containing 34 individuals derived from 10 genetic groups) was first constructed based on phenotypic growth traits and whole-genome resequencing (WGS) data.
View Article and Find Full Text PDF