A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

An NLP-based method to mine gene and function relationships from published articles. | LitMetric

An NLP-based method to mine gene and function relationships from published articles.

Sci Rep

Department of Biology, University of Alabama at Birmingham, 3100 East Science Hall, 902 14th Street South, Birmingham, AL, 35294, USA.

Published: March 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Understanding the intricacies of genes function within biological systems is paramount for scientific advancement and medical progress. Owing to the evolving landscape of this research and the complexity of biological processes, however, this task presents challenges. We introduce PATHAK, a natural language processing (NLP)-based method that mines relationships between genes and their functions from published scientific articles. PATHAK utilizes a pre-trained Transformer language model to generate sentence embeddings from a vast dataset of scientific documents. This enables the identification of meaningful associations between genes and their potential functional annotations. Our approach is adaptable and applicable across diverse scientific domains. Applying PATHAK to over 17,000 research articles focused on Arabidopsis thaliana, we assigned approximately 1493 GO terms to 10,976 genes by analyzing article sentences, comparing their embeddings to GO term embeddings, and mapping potential matches. The model demonstrates moderate-to-high predictive accuracy, capturing ~ 57% overlap of GO terms (6258 out of 10,976) between predicted and known annotations on TAIR, including 1271 and 161 exact matches and 4826 partially related terms. This method promises to significantly advance our understanding of gene functionality and potentially accelerate discoveries in the context of plant development, growth and stress responses in plants and other systems.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876572PMC
http://dx.doi.org/10.1038/s41598-025-91809-zDOI Listing

Publication Analysis

Top Keywords

nlp-based method
8
method mine
4
mine gene
4
gene function
4
function relationships
4
relationships published
4
published articles
4
articles understanding
4
understanding intricacies
4
genes
4

Similar Publications