Article Synopsis

  • A new approach to genomics experiments involves doing fewer experiments and using computational methods to fill in the gaps, but there are still uncertainties about which imputation methods work best and how to evaluate their performance effectively.* -
  • The study reviews 23 different methods from the ENCODE Imputation Challenge and discovers that assessing these methods is complicated by factors like changes in data collection practices, varying amounts of data, and overlapping evaluation metrics.* -
  • The authors suggest practical solutions to these challenges and highlight promising areas for future research to improve the robustness of imputation methods in genomics.*

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111747PMC
http://dx.doi.org/10.1186/s13059-023-02915-yDOI Listing

Publication Analysis

Top Keywords

encode imputation
8
imputation challenge
8
challenge critical
4
critical assessment
4
methods
4
assessment methods
4
methods cross-cell
4
cross-cell type
4
imputation
4
type imputation
4

Similar Publications

Missing genotypes reduce statistical power and hinder genome-wide association studies. While reference-based methods are popular, they struggle in complex regions and under population mismatch. Existing reference-free deep learning models show promise in addressing this issue but often fail to impute rare variants in small datasets.

View Article and Find Full Text PDF

Carotid Intima-Media Thickness (CIMT) is defined as a non-invasive and well-validated sign of asymptomatic atherosclerosis and an early predictor of cardiovascular disease (CVD). We assembled a carefully curated dataset of 100 adult patients, encompassing 13 clinical, biochemical and demographic variables routinely collected in outpatient practice. After a five-stage pre-processing pipeline median/mode imputation, categorical encoding, Min-Max scaling, inter-quartile-range outlier removal and SMOTE-NC balancing we trained a Kolmogorov-Arnold Network (KAN) to assign each patient to one of four CIMT-defined risk tiers mentioned as "No", "Low", "Medium", "High".

View Article and Find Full Text PDF

Diffusion model for imputing time-series gut microbiome profiles using phylogenetic information and metadata integration.

Bioinform Adv

July 2025

Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.

Motivation: The gut microbiota interacts closely with the host, playing crucial roles in maintaining health. Analysing time-series genomic data enables the investigation of dynamic microbiota changes. However, missing values create significant analytical challenges.

View Article and Find Full Text PDF

It is notorious that single-cell RNA sequencing (scRNA-seq) data contain a significant number of missing values due to technical variability. The issue of missing values presents a major challenge in scRNA-seq analysis, especially, complicating the identification of cell types via clustering. To address this issue, various methods have been developed to impute the missing data in scRNA-seq clustering.

View Article and Find Full Text PDF

Anomaly Detection in Nuclear Power Production Based on Neural Normal Stochastic Process.

Sensors (Basel)

July 2025

China Nuclear Power Operation Technology Corporation, Wuhan 430233, China.

To ensure the safety of nuclear power production, nuclear power plants deploy numerous sensors to monitor various physical indicators during production, enabling the early detection of anomalies. Efficient anomaly detection relies on complete sensor data. However, compared to conventional energy sources, the extreme physical environment of nuclear power plants is more likely to negatively impact the normal operation of sensors, compromising the integrity of the collected data.

View Article and Find Full Text PDF