Prediction of Retention Time by Combining Multiple Data Sets with Chromatographic Parameter Vectorization and Transfer Learning.

Yansong Li , Kunjie Dong , Di Yu , Dongdong Huang , Xinyu Liu , Guowang Xu , Xiaohui Lin

Anal Chem

School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, China.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Retention time (RT) can provide orthogonal information to mass spectra, supporting the qualitative identification. However, RT is influenced by experimental conditions and column parameters, and it is difficult to have a large amount of RT data in the user's experimental conditions. Hence, various machine learning methods, including advanced deep learning approaches, have been developed for RT prediction. However, most of them were limited to a given column and operational conditions. In the meantime, data sparsity often hinders the prediction performance. In this study, we propose an MDL-TL method that combines multiple data sets to jointly train the base model. MDL-TL vectorizes the column and conditions (chromatographic parameters, CPs) using word2vec and autoencoders, and distinguishes the data sets from different chromatographic experiments by including the CPs in the compound representation. This not only augments the data but also introduces the CPs into the RT prediction, allowing the pretrained model to be efficiently transferred to different target systems by fine-tuning. MDL-TL was evaluated against five popular deep learning approaches and four machine learning approaches on 14 reversed-phase liquid chromatography data sets and 14 hydrophilic interaction liquid chromatography data sets, respectively. The results show that our method surpassed the compared methods, including transfer learning methods based on the METLIN small molecule retention time (SMRT) data set, in mean absolute error, median absolute error, mean relative error, and in most cases, demonstrating that MDL-TL is a promising approach for predicting RTs for various chromatographic systems and operational conditions.

Download full-text PDF	Source
http://dx.doi.org/10.1021/acs.analchem.5c01703	DOI Listing

Publication Analysis

Top Keywords

data sets

retention time

learning approaches

data

multiple data

sets chromatographic

transfer learning

experimental conditions

machine learning

learning methods

Similar Publications

Solvation Structure of Np in a Noncomplexing Environment.

Inorg Chem

September 2025

Pacific Northwest National Laboratory, Richland, Washington 99352, United States.

Daria V Boglaienko , Amity Andersen , John L Fulton , Sergey I Sinkov , Sarah A Saslow

The solvation structure of an Np ion in an aqueous, noncomplexing and nonoxidizing environment of trifluoromethanesulfonic (triflic) acid was investigated with X-ray absorption spectroscopy (XAS) combined with ab initio molecular dynamics (AIMD) and time-dependent density functional theory (TDDFT) calculations. Np L-edge X-ray absorption near-edge structure (XANES) and extended X-ray absorption fine structure (EXAFS) data were collected for Np in 1, 3, and 7 M triflic acid using a laboratory-scale spectrometer and separately at a synchrotron facility, producing data sets in excellent agreement. TDDFT calculations revealed a weak pre-edge feature not previously reported for Np L-edge XANES.

View Article and Find Full Text PDF

Similar Publications

Testing the effects of two different zebrafish exposure paradigms on transcriptomic-based chemical risk assessment using the flame retardant Triphenyl Phosphate.

Toxicol Sci

September 2025

Aquatic and Crop Resource Development, National Research Council of Canada, Halifax, NS, B3H 3Z1, Canada.

Michael G Morash , Morgan W Kirzinger , John C Achenbach , Ananda B Venkatachalam , Joseph P M Hui

In the zebrafish larval toxicity model, phenotypic changes induced by chemical exposure can potentially be explained and predicted by the analysis of gene expression changes at sub-phenotypic concentrations. The increase in knowledge of gene pathway-specific effects arising from the zebrafish transcriptomic model has the potential to enhance the role of the larval zebrafish as a component of Integrated Approaches to Testing and Assessment (IATA). In this paper, we compared the transcriptomic responses of triphenyl phosphate between two standard exposure paradigms, the Zebrafish Embryo Toxicity (ZET) and General and Behavioural Toxicity (GBT) assays.

View Article and Find Full Text PDF

Similar Publications

HIPSTR: highest independent posterior subtree reconstruction in TreeAnnotator X.

Bioinformatics

September 2025

Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom.

Guy Baele , Luiz M Carvalho , Marius Brusselmans , Gytis Dudas , Xiang Ji

Summary: In Bayesian phylogenetic and phylodynamic studies it is common to summarise the posterior distribution of trees with a time-calibrated summary phylogeny. While the maximum clade credibility (MCC) tree is often used for this purpose, we here show that a novel summary tree method-the highest independent posterior subtree reconstruction, or HIPSTR-contains consistently higher supported clades over MCC. We also provide faster computational routines for estimating both summary trees in an updated version of TreeAnnotator X, an open-source software program that summarizes the information from a sample of trees and returns many helpful statistics such as individual clade credibilities contained in the summary tree.

View Article and Find Full Text PDF

Similar Publications

PERC: a suite of software tools for the curation of cryoEM data with application to simulation, modeling and machine learning.

Acta Crystallogr F Struct Biol Commun

October 2025

Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.

Beatriz Costa-Gomes , Joel Greer , Nikolai Juraschko , James Parkhurst , Jola Mirecka

Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated data sets. Being able to easily access and utilize these is crucial to allow researchers to make optimal use of their research effort.

View Article and Find Full Text PDF

Similar Publications

Exploring the Frontiers of Computational NMR: Methods, Applications, and Challenges.

Chem Rev

September 2025

Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44195, United States.

Susanta Das , Kenneth M Merz

Computational methods have revolutionized NMR spectroscopy, driving significant advancements in structural biology and related fields. This review focuses on recent developments in quantum chemical and machine learning approaches for computational NMR, emphasizing their role in enhancing accuracy, efficiency, and scalability. QM methods provide precise predictions of NMR parameters, enabling detailed structural characterization of diverse systems.

View Article and Find Full Text PDF

Similar Publications