Reconstructing 3D chromosome structures from single-cell Hi-C data with SO(3)-equivariant graph neural networks.

NAR Genom Bioinform

Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States.

Published: March 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The spatial conformation of chromosomes and genomes of single cells is relevant to cellular function and useful for elucidating the mechanism underlying gene expression and genome methylation. The chromosomal contacts (i.e. chromosomal regions in spatial proximity) entailing the three-dimensional (3D) structure of the genome of a single cell can be obtained by single-cell chromosome conformation capture techniques, such as single-cell Hi-C (ScHi-C). However, due to the sparsity of chromosomal contacts in ScHi-C data, it is still challenging for traditional 3D conformation optimization methods to reconstruct the 3D chromosome structures from ScHi-C data. Here, we present a machine learning-based method based on a novel SO(3)-equivariant graph neural network (HiCEGNN) to reconstruct 3D structures of chromosomes of single cells from ScHi-C data. HiCEGNN consistently outperforms both the traditional optimization methods and the only other deep learning method across diverse cells, different structural resolutions, and different noise levels of the data. Moreover, HiCEGNN is robust against the noise in the ScHi-C data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928942PMC
http://dx.doi.org/10.1093/nargab/lqaf027DOI Listing

Publication Analysis

Top Keywords

schi-c data
16
chromosome structures
8
single-cell hi-c
8
so3-equivariant graph
8
graph neural
8
single cells
8
chromosomal contacts
8
optimization methods
8
data hicegnn
8
data
6

Similar Publications

High-throughput single-cell Hi-C (scHi-C) technologies have opened new avenues for investigating cell-to-cell variability in the three-dimensional organization of the genome within individual nuclei. Despite their potential, analyses of scHi-C data are hindered by data sparsity, which varies substantially across cells. To address this challenge, recent methods aim to denoise scHi-C data and differentiate between two types of zero entries: structural zeros (SZs), which reflect true absence of contacts due to biological structure, and dropouts (DOs), which arise from insufficient sequencing depth.

View Article and Find Full Text PDF

Hi-C and single cell Hi-C (scHi-C) data are now routinely generated for studying an array of biological questions of interest, including whole genome chromatin organization to gain a better understanding of the chromosome three-dimensional hierarchical structure: compartments, Topologically Associated Domains (TADs), and long-range interactions. Due to concerns about data quality, especially for scHi-C because of its sparsity, data quality improvement is seen as a necessary step before performing analyses to answer biological questions. As such, methods have been developed accordingly, among them is a set of methods that are "random walk"- based, including random walk with a limited number of steps (RWS) and random walk with restart (RWR).

View Article and Find Full Text PDF

Topologically associating domains (TADs) uncovered on bulk Hi-C data are regarded as fundamental building blocks of a three-dimensional genome, and they are believed to effectively participate in the regulatory programs of gene expression. The computational analysis of TADs on single-cell Hi-C (scHi-C) data in the era of single-cell transcriptomics has received continuous attention since it may provide information beyond that on bulk Hi-C data. Unfortunately, the contact matrix for a single cell is ultra-sparse due to the low sequencing depth.

View Article and Find Full Text PDF

Unicorn: enhancing single-cell Hi-C data with blind super-resolution for 3D genome structure reconstruction.

Bioinformatics

July 2025

Department of Computer Science, University of Colorado, Colorado Springs, 1420 Austin Bluffs Parkway, Colorado Springs, CO, 80918, United States.

Motivation: Single-cell Hi-C (scHi-C) data provide critical insights into chromatin interactions at individual cell levels, uncovering unique genomic 3D structures. However, scHi-C datasets are characterized by sparsity and noise, complicating efforts to accurately reconstruct high-resolution chromosomal structures. In this study, we present ScUnicorn, a novel blind super-resolution framework for scHi-C data enhancement.

View Article and Find Full Text PDF

DeepNanoHi-C: deep learning enables accurate single-cell nanopore long-read data analysis and 3D genome interpretation.

Nucleic Acids Res

July 2025

School of Artificial Intelligence, Jilin University, 2699 Qianjin Street, Chaoyang District, Changchun, Jilin 130015, China.

Single-cell long-read concatemer sequencing (scNanoHi-C) technology provides unique insights into the higher-order chromatin structure across the genome in individual cells, crucial for understanding 3D genome organization. However, the lack of specialized analytical tools for scNanoHi-C data impedes progress, as existing methods, which primarily focus on scHi-C technologies, do not fully address the specific challenges of scNanoHi-C, such as sparsity, cell-specific variability, and complex chromatin interaction networks. Here, we introduce DeepNanoHi-C, a novel deep learning framework specifically designed for scNanoHi-C data, which leverages a multistep autoencoder and a Sparse Gated Mixture of Experts (SGMoE) to accurately predict chromatin interactions by imputing sparse contact maps, thereby capturing cell-specific structural features.

View Article and Find Full Text PDF