The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles.

Jacob Matthew Schreiber , Carles A Boix , Jin Wook Lee , Hongyang Li , Yuanfang Guan , Chun-Chieh Chang , Jen-Chien Chang , Alex Hawkins-Hooker , Bernhard Schölkopf , Gabriele Schweikert , Mateo Rojas Carulla , Arif Canakoglu , Francesco Guzzo , Luca Nanni , Marco Masseroli , Mark James Carman , Pietro Pinoli , Chenyang Hong , Kevin Y Yip , Jefrey P Spence

Genome Biol

Department of Genetics, Stanford University, Stanford, CA, USA.

Published: April 2023

A new approach to genomics experiments involves doing fewer experiments and using computational methods to fill in the gaps, but there are still uncertainties about which imputation methods work best and how to evaluate their performance effectively.* -
The study reviews 23 different methods from the ENCODE Imputation Challenge and discovers that assessing these methods is complicated by factors like changes in data collection practices, varying amounts of data, and overlapping evaluation metrics.* -
The authors suggest practical solutions to these challenges and highlight promising areas for future research to improve the robustness of imputation methods in genomics.*

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111747	PMC
http://dx.doi.org/10.1186/s13059-023-02915-y	DOI Listing

Publication Analysis

Top Keywords

encode imputation

imputation challenge

challenge critical

critical assessment

methods

assessment methods

methods cross-cell

cross-cell type

imputation

type imputation

Similar Publications

BiU-Net: A Biologically Informed U-Net for Genotype Imputation.

Res Sq

August 2025

Lei Huang , Kuan-Jui Su , Meng Song , Chuan Qiu , Loren Gragert

Missing genotypes reduce statistical power and hinder genome-wide association studies. While reference-based methods are popular, they struggle in complex regions and under population mismatch. Existing reference-free deep learning models show promise in addressing this issue but often fail to impute rare variants in small datasets.

View Article and Find Full Text PDF

Similar Publications

Kolmogorov-Arnold Networks for predicting carotid intima-media thickness in cardiovascular risk assessment.

Sci Rep

September 2025

Data Analytics, Generation Australia, Sydney, 2000, Australia.

Ali Al Bataineh , Bandi Vamsi , Mohammed El-Abd , Bhanu Prakash Doppala

Carotid Intima-Media Thickness (CIMT) is defined as a non-invasive and well-validated sign of asymptomatic atherosclerosis and an early predictor of cardiovascular disease (CVD). We assembled a carefully curated dataset of 100 adult patients, encompassing 13 clinical, biochemical and demographic variables routinely collected in outpatient practice. After a five-stage pre-processing pipeline median/mode imputation, categorical encoding, Min-Max scaling, inter-quartile-range outlier removal and SMOTE-NC balancing we trained a Kolmogorov-Arnold Network (KAN) to assign each patient to one of four CIMT-defined risk tiers mentioned as "No", "Low", "Medium", "High".

View Article and Find Full Text PDF

Similar Publications

Diffusion model for imputing time-series gut microbiome profiles using phylogenetic information and metadata integration.

Bioinform Adv

July 2025

Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.

Misato Seki , Yao-Zhong Zhang , Seiya Imoto

Motivation: The gut microbiota interacts closely with the host, playing crucial roles in maintaining health. Analysing time-series genomic data enables the investigation of dynamic microbiota changes. However, missing values create significant analytical challenges.

View Article and Find Full Text PDF

Similar Publications

DIC: Deep Imputing and Clustering Single Cell RNA Sequencing Data.

IEEE Trans Comput Biol Bioinform

March 2025

Kang Jiang , Rwan Ahmed , Petros Papagerakis , Fang-Xiang Wu

It is notorious that single-cell RNA sequencing (scRNA-seq) data contain a significant number of missing values due to technical variability. The issue of missing values presents a major challenge in scRNA-seq analysis, especially, complicating the identification of cell types via clustering. To address this issue, various methods have been developed to impute the missing data in scRNA-seq clustering.

View Article and Find Full Text PDF

Similar Publications

Anomaly Detection in Nuclear Power Production Based on Neural Normal Stochastic Process.

Sensors (Basel)

July 2025

China Nuclear Power Operation Technology Corporation, Wuhan 430233, China.

Linyu Liu , Shiqiao Liu , Shuan He , Kui Xu , Yang Lan

To ensure the safety of nuclear power production, nuclear power plants deploy numerous sensors to monitor various physical indicators during production, enabling the early detection of anomalies. Efficient anomaly detection relies on complete sensor data. However, compared to conventional energy sources, the extreme physical environment of nuclear power plants is more likely to negatively impact the normal operation of sensors, compromising the integrity of the collected data.

View Article and Find Full Text PDF

Similar Publications