Prediction of gene regulatory connections with joint single-cell foundation models and graph-based learning.

Bioinformatics

Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, United States.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Single-cell RNA sequencing (scRNA-seq) data offers unprecedented opportunities to infer gene regulatory networks (GRNs) at a fine-grained resolution, shedding light on cellular phenotypes at the molecular level. However, the high sparsity, noise, and dropout events inherent in scRNA-seq data pose significant challenges for accurate and reliable GRN inference. The rapid growth in experimentally validated transcription factor-DNA binding data has enabled supervised machine learning methods, which rely on known regulatory interactions to learn patterns, and achieve high accuracy in GRN inference by framing it as a gene regulatory link prediction task. This study addresses the gene regulatory link prediction problem by learning vectorized representations at the gene level to predict missing regulatory interactions. However, a higher performance of supervised learning methods requires a large amount of known TF-DNA binding data, which is often experimentally expensive and therefore limited in amount. Advances in large-scale pre-training and transfer learning provide a transformative opportunity to address this challenge. In this study, we leverage large-scale pre-trained models, trained on extensive scRNA-seq datasets and known as single-cell foundation models (scFMs). These models are combined with joint graph-based learning to establish a robust foundation for gene regulatory link prediction.

Results: We propose scRegNet, a novel and effective framework that leverages scFMs with joint graph-based learning for gene regulatory link prediction. scRegNet achieves state-of-the-art results in comparison with nine baseline methods on seven scRNA-seq benchmark datasets. Additionally, scRegNet is more robust than the baseline methods on noisy training data.

Availability And Implementation: The source code is available at https://github.com/sindhura-cs/scRegNet.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261397PMC
http://dx.doi.org/10.1093/bioinformatics/btaf217DOI Listing

Publication Analysis

Top Keywords

gene regulatory
24
regulatory link
16
graph-based learning
12
link prediction
12
regulatory
8
single-cell foundation
8
foundation models
8
scrna-seq data
8
grn inference
8
binding data
8

Similar Publications

Gene dysregulation impairs placental angiogenesis in allogeneic pig pregnancies.

Anim Reprod Sci

September 2025

Department of Biomedical & Clinical Sciences (BKV), BKH/Obstetrics & Gynecology, Faculty of Medicine and Health Sciences, Linköping University, Linköping SE-58185, Sweden.

Embryo transfer (ET) is a valuable reproductive technology in pigs, albeit its efficiency remains significantly lower than that of natural mating or artificial insemination (AI), owing to high embryonic death rates. Critical for embryo survival and pregnancy success is the placenta, which supports conceptus development through nutrient exchange, hormone production, and immune modulation. Alterations in placental development and function may therefore underlie the reduced efficiency of ET.

View Article and Find Full Text PDF

Classical Hodgkin Lymphoma (CHL) is characterized by a complex tumor microenvironment (TME) that supports disease progression. While immune cell recruitment by Hodgkin and Reed-Sternberg (HRS) cells is well-documented, the role of non-malignant B cells in relapse remains unclear. Using single-cell RNA sequencing (scRNA-seq) on paired diagnostic and relapsed CHL samples, we identified distinct shifts in B-cell populations, particularly an enrichment of naïve B cells and a reduction of memory B cells in early-relapse compared to late-relapse and newly diagnosed CHL.

View Article and Find Full Text PDF

Genome imbalance, resulting from varying the dosage of individual chromosomes (aneuploidy), has a more detrimental effect than changes in complete sets of chromosomes (haploidy/polyploidy). This imbalance is likely due to disruptions in stoichiometry and interactions among macromolecular assemblies. Previous research has shown that aneuploidy causes global modulation of protein-coding genes (PCGs), microRNAs, and transposable elements (TEs), affecting both the varied chromosome (cis-located) and unvaried genome regions (trans-located) across various taxa.

View Article and Find Full Text PDF

Objective: To explore B cell infiltration-related genes in endometriosis (EM) and investigate their potential as diagnostic biomarkers.

Methods: Gene expression data from the GSE51981 dataset, containing 77 endometriosis and 34 control samples, were analyzed to detect differentially expressed genes (DEGs). The xCell algorithm was applied to estimate the infiltration levels of 64 immune and stromal cell types, focusing on B cells and naive B cells.

View Article and Find Full Text PDF

Replication of HIV-1 requires the coordinated action of host and viral transcription factors, most critically the viral transactivator Tat and the host nuclear factor κB (NF-κB). This activity is disrupted in infected cells that are cultured with extracellular vesicles (EVs) present in human semen, suggesting that they contain factors that could inform the development of new therapeutics. Here, we explored the contents of semen-derived EVs (SEVs) from uninfected donors and individuals with HIV-1 and identified host proteins that interacted with HIV Tat and the NF-κB subunit p65.

View Article and Find Full Text PDF