Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1075
Function: getPubMedXML
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3195
Function: GetPubMedArticleOutput_2016
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Objectives: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of Natural Language Processing (NLP) techniques, particularly Large Language Models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. This review aims to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data.
Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our scoping review was conducted across PubMed, Medline, Scopus, Web of Science, Embase, and ACM Digital Library. Studies were included if they focused on NLP methodologies applied to genomic sequencing data analysis, without restrictions on publication date or article type.
Results: A total of 26 studies published between 2021 and April 2024 were selected for review. The review highlights that tokenization and transformer models enhance the processing and understanding of genomic data, with applications in predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility.
Discussion: The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while providing a better understanding of its complex structures. It can potentially drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is needed to discuss and overcome limitations, enhancing model transparency and applicability.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623714 | PMC |