A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Unsupervised evaluation of pre-trained DNA language model embeddings. | LitMetric

Unsupervised evaluation of pre-trained DNA language model embeddings.

BMC Genomics

Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Wolstein Research Building, Cleveland, OH, USA.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: DNA Language Models (DLMs) have generated a lot of hope and hype for solving complex genetics tasks. These models have demonstrated remarkable performance in tasks like gene finding, enhancer annotation, and histone modification. However, they have struggled with tasks such as learning individual-level personal transcriptome variation, highlighting the need for robust evaluation approaches. Current evaluation approaches assess models based on multiple downstream tasks, which are computationally demanding and fail to evaluate their ability to learn as generalist agents.

Results: We propose a framework to evaluate DLM embeddings using unsupervised numerical linear algebra-based metrics RankMe, NESum, and StableRank. Embeddings were generated from six state-of-the-art DLMs: Nucleotide Transformer, DNA-BERT2, HyenaDNA, MistralDNA, GENA-LM, and GROVER, across multiple genomic benchmark datasets. Our analysis revealed several key insights. First, low pairwise Pearson correlations and limited variance captured by the top principal components suggest that DLM embeddings are high-dimensional and non-redundant. Second, GENA-LM frequently demonstrated strong performance across all unsupervised evaluation metrics, often outperforming other models. Third, while all models performed well on supervised classification tasks, GENA-LM achieved the highest accuracy and F1 scores across most datasets. Importantly, we observed a positive correlation between unsupervised metrics and supervised performance, supporting the utility of unsupervised metrics as effective proxies for model quality assessment.

Conclusion: This study introduces a computationally efficient framework for evaluating DLMs. Our results show that GENA-LM, DNA-BERT2, and Nucleotide Transformer frequently outperform HyenaDNA and Mistral across both unsupervised and supervised evaluations. Moreover, the observed positive correlations between unsupervised metrics and downstream classification performance highlight the potential of these metrics as effective proxies for assessing model quality.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12315385PMC
http://dx.doi.org/10.1186/s12864-025-11913-2DOI Listing

Publication Analysis

Top Keywords

unsupervised metrics
12
unsupervised evaluation
8
dna language
8
evaluation approaches
8
dlm embeddings
8
nucleotide transformer
8
observed positive
8
metrics effective
8
effective proxies
8
model quality
8

Similar Publications