Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography-Mass Spectrometry.

Ying Zhang , Sili Fan , Gert Wohlgemuth , Oliver Fiehn

Metabolites

West Coast Metabolomics Center, UC Davis, 451 Health Sciences Drive, Davis, CA 95616, USA.

Published: August 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography-mass spectrometry (GC-MS). Here, we compare the results of different normalization methods for a study with more than 4000 human plasma samples involved in a type 2 diabetes cohort study, in addition to 413 pooled quality control (QC) samples, 413 commercial pooled plasma samples, and a set of 25 stable isotope-labeled internal standards used for every sample. Data acquisition was conducted across 1.2 years, including seven column changes. In total, 413 pooled QC (training) and 413 BioIVT samples (validation) were used for normalization comparisons. Surprisingly, neither internal standards nor sum-based normalizations yielded median precision of less than 30% across all 563 metabolite annotations. While the machine-learning-based SERRF algorithm gave 19% median precision based on the pooled quality control samples, external cross-validation with BioIVT plasma pools yielded a median 34% relative standard deviation (RSD). We developed a new method: systematic error reduction by denoising autoencoder (SERDA). SERDA lowered the median standard deviations of the training QC samples down to 16% RSD, yielding an overall error of 19% RSD when applied to the independent BioIVT validation QC samples. This is the largest study on GC-MS metabolomics ever reported, demonstrating that technical errors can be normalized and handled effectively for this assay. SERDA was further validated on two additional large-scale GC-MS-based human plasma metabolomics studies, confirming the superior performance of SERDA over SERRF or sum normalizations.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10456436	PMC
http://dx.doi.org/10.3390/metabo13080944	DOI Listing

Publication Analysis

Top Keywords

denoising autoencoder

gas chromatography-mass

chromatography-mass spectrometry

human plasma

plasma samples

413 pooled

pooled quality

quality control

control samples

internal standards

Similar Publications

BiU-Net: A Biologically Informed U-Net for Genotype Imputation.

Res Sq

August 2025

Lei Huang , Kuan-Jui Su , Meng Song , Chuan Qiu , Loren Gragert

Missing genotypes reduce statistical power and hinder genome-wide association studies. While reference-based methods are popular, they struggle in complex regions and under population mismatch. Existing reference-free deep learning models show promise in addressing this issue but often fail to impute rare variants in small datasets.

View Article and Find Full Text PDF

Similar Publications

DiffRaman: A conditional latent denoising diffusion probabilistic model for enhancing bacterial identification via Raman spectra generation under limited data.

Anal Chim Acta

October 2025

State Key Laboratory of Precision Measurement Technology and Instruments, Tsinghua University, Beijing, 100084, China. Electronic address:

Haiming Yao , Wei Luo , Ang Gao , Tao Zhou , Xue Wang

Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks.

View Article and Find Full Text PDF

Similar Publications

RePaint High-Density Surface Electromyography Signal Using Denoising Diffusion Probabilistic Model.

IEEE Trans Biomed Eng

September 2025

Yihui Zhao , Jiawei Liao , Xia Fang , Hai Wang , Ning Jiang

Objective: High-density surface electromyography (HD-sEMG) has emerged as a powerful tool for myoelectric control and activation pattern analysis. However, signal loss due to poor electrode contact and channel corruption remains a significant challenge, limiting the reliability and practical applications of HD-sEMG signals. Conventional interpolation methods fail to effectively reconstruct corrupted signals, especially when multiple adjacent channels are affected.

View Article and Find Full Text PDF

Similar Publications

Explainable Deep Learning Framework for SERS Bioquantification.

ACS Sens

September 2025

Melville Laboratory for Polymer Synthesis, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge CB2 1EW, U.K.

Jihan K Zaki , Jakub Tomasik , Jade A McCune , Sabine Bahn , Pietro Lió

Surface-enhanced Raman spectroscopy (SERS) is rapidly gaining attention as a fast and inexpensive method of biomarker quantification, which can be combined with deep learning to elucidate complex biomarker-disease relationships. Current standard practices in SERS analysis are behind the state-of-the-art machine learning approaches; however, the present challenges of SERS analysis could be effectively addressed with a robust computational framework. Furthermore, there is a need for improved model explainability for SERS analysis, which at present is insufficient in assessing the contexts in which confounding factors affect prediction outcomes.

View Article and Find Full Text PDF

Similar Publications

CNN-LSTM-AM approach for outdoor wireless optical communication systems.

Sci Rep

September 2025

Department of Computer Engineering, Faculty of Engineering, Pharos University, Canal El Mahmoudia Street, Beside Green Plaza Complex 21648, Alexandria, Egypt.

Montaser Abdelsattar , Eman S Amer , Hamdy A Ziedan , Wessam M Salama

This paper introduces the enhancement of Visible Light Communications (VLC) for V2V using artificial intelligence models. Different V2V scenarios are simulated. The first scenario considers a specific longitudinal separation and a variable lateral shift between vehicles.

View Article and Find Full Text PDF

Similar Publications