98%
921
2 minutes
20
A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111747 | PMC |
http://dx.doi.org/10.1186/s13059-023-02915-y | DOI Listing |
Missing genotypes reduce statistical power and hinder genome-wide association studies. While reference-based methods are popular, they struggle in complex regions and under population mismatch. Existing reference-free deep learning models show promise in addressing this issue but often fail to impute rare variants in small datasets.
View Article and Find Full Text PDFSci Rep
September 2025
Data Analytics, Generation Australia, Sydney, 2000, Australia.
Carotid Intima-Media Thickness (CIMT) is defined as a non-invasive and well-validated sign of asymptomatic atherosclerosis and an early predictor of cardiovascular disease (CVD). We assembled a carefully curated dataset of 100 adult patients, encompassing 13 clinical, biochemical and demographic variables routinely collected in outpatient practice. After a five-stage pre-processing pipeline median/mode imputation, categorical encoding, Min-Max scaling, inter-quartile-range outlier removal and SMOTE-NC balancing we trained a Kolmogorov-Arnold Network (KAN) to assign each patient to one of four CIMT-defined risk tiers mentioned as "No", "Low", "Medium", "High".
View Article and Find Full Text PDFBioinform Adv
July 2025
Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.
Motivation: The gut microbiota interacts closely with the host, playing crucial roles in maintaining health. Analysing time-series genomic data enables the investigation of dynamic microbiota changes. However, missing values create significant analytical challenges.
View Article and Find Full Text PDFIEEE Trans Comput Biol Bioinform
March 2025
It is notorious that single-cell RNA sequencing (scRNA-seq) data contain a significant number of missing values due to technical variability. The issue of missing values presents a major challenge in scRNA-seq analysis, especially, complicating the identification of cell types via clustering. To address this issue, various methods have been developed to impute the missing data in scRNA-seq clustering.
View Article and Find Full Text PDFSensors (Basel)
July 2025
China Nuclear Power Operation Technology Corporation, Wuhan 430233, China.
To ensure the safety of nuclear power production, nuclear power plants deploy numerous sensors to monitor various physical indicators during production, enabling the early detection of anomalies. Efficient anomaly detection relies on complete sensor data. However, compared to conventional energy sources, the extreme physical environment of nuclear power plants is more likely to negatively impact the normal operation of sensors, compromising the integrity of the collected data.
View Article and Find Full Text PDF