The ENmix DNA methylation analysis pipeline for Illumina BeadChip and comparisons with seven other preprocessing pipelines.

Clin Epigenetics

Epidemiology Branch, National Institute of Environmental Health Sciences, NIH, MD A3-05, 111 T.W. Alexander Drive, PO Box 12233, Research Triangle Park, NC, 27709, USA.

Published: December 2021


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Illumina DNA methylation arrays are high-throughput platforms for cost-effective genome-wide profiling of individual CpGs. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful "preprocessing" of raw data.

Methods: Here we describe the ENmix preprocessing pipeline and compare it to a set of seven published alternative pipelines (ChAMP, Illumina, SWAN, Funnorm, Noob, wateRmelon, and RnBeads). We use two large sets of duplicate sample measurements with 450 K and EPIC arrays, along with mixtures of isogenic methylated and unmethylated cell line DNA to compare raw data and that preprocessed via different pipelines.

Results: Our evaluations show that the ENmix pipeline performs the best with significantly higher correlation and lower absolute difference between duplicate pairs, higher intraclass correlation coefficients (ICC) and smaller deviations from expected methylation level in mixture experiments. In addition to the pipeline function, ENmix software provides an integrated set of functions for reading in raw data files from mouse and human arrays, quality control, data preprocessing, visualization, detection of differentially methylated regions (DMRs), estimation of cell type proportions, and calculation of methylation age clocks. ENmix is computationally efficient, flexible and allows parallel computing. To facilitate further evaluations, we make all datasets and evaluation code publicly available.

Conclusion: Careful selection of robust data preprocessing methods is critical for DNA methylation array studies. ENmix outperformed other pipelines in our evaluations to minimize experimental variation and to improve data quality and study power.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662917PMC
http://dx.doi.org/10.1186/s13148-021-01207-1DOI Listing

Publication Analysis

Top Keywords

dna methylation
12
raw data
8
data preprocessing
8
enmix
6
methylation
5
data
5
enmix dna
4
methylation analysis
4
pipeline
4
analysis pipeline
4

Similar Publications

Background: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder lacking objective biomarkers for early diagnosis. DNA methylation is a promising epigenetic marker, and machine learning offers a data-driven classification approach. However, few studies have examined whole-blood, genome-wide DNA methylation profiles for ASD diagnosis in school-aged children.

View Article and Find Full Text PDF

Somatic embryogenesis (SE) is an in vitro mass propagation system widely employed in plant breeding programs. However, its efficiency in many forest species remains limited due to their recalcitrance. SE relies on the induction of somatic cell reprogramming into embryogenic pathways, a process influenced by transcriptomic changes regulated, among other factors, by epigenetic modifications such as DNA methylation, histone methylation, and histone acetylation.

View Article and Find Full Text PDF

The genomes of 43 distinct lactococcal strains were reconstructed by a combination of long- and short-read sequencing, resolving the plasmid complement and methylome of these strains. The genomes comprised 43 chromosomes of approximately 2.5 Mb each and 269 plasmids ranging from 2 to 211 kb (at an average occurrence of 6 per strain).

View Article and Find Full Text PDF

It has been reported that DNA methylation in the epigenetic profile of the genes LEP and ADIPOQ is associated with obesity. To the best of our knowledge, there are no previous reports assessing the methylation of the LEP, LEPR, and ADIPOQ genes in subjects with metabolically healthy obesity (MHO). Therefore, the aim of this study was to determine the association between methylation of the LEP, LEPR, and ADIPOQ genes with the MHO phenotype.

View Article and Find Full Text PDF

Colorectal cancer (CRC) constitutes a significant global health challenge, accounting for a considerable proportion of cancer cases and associated mortality. Projections indicate a potential increase in new cases by 2040, attributed to demographic factors such as aging and population growth. Although advancements in the understanding of CRC pathophysiology have broadened treatment options, challenges such as drug resistance and adverse effects persist, highlighting the necessity for enhanced diagnostic methodologies.

View Article and Find Full Text PDF