PROSAC as a selection tool for SO-PLS regression: A strategy for multi-block data fusion.

Anal Chim Acta

KU Leuven, Department of Biosystems, Division of Animal and Human Health Engineering, Campus Geel, Kleinhoefstraat 4, 2440, Geel, Belgium. Electronic address:

Published: August 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Spectral data from multiple sources can be integrated into multi-block fusion chemometric models, such as sequentially orthogonalized partial-least squares (SO-PLS), to improve the prediction of sample quality features. Pre-processing techniques are often applied to mitigate extraneous variability, unrelated to the response variables. However, the selection of suitable pre-processing methods and identification of informative data blocks becomes increasingly complex and time-consuming when dealing with a large number of blocks. The problem addressed in this work is the efficient pre-processing, selection, and ordering of data blocks for targeted applications in SO-PLS.

Results: We introduce the PROSAC-SO-PLS methodology, which employs pre-processing ensembles with response-oriented sequential alternation calibration (PROSAC). This approach identifies the best pre-processed data blocks and their sequential order for specific SO-PLS applications. The method uses a stepwise forward selection strategy, facilitated by the rapid Gram-Schmidt process, to prioritize blocks based on their effectiveness in minimizing prediction error, as indicated by the lowest prediction residuals. To validate the efficacy of our approach, we showcase the outcomes of three empirical near-infrared (NIR) datasets. Comparative analyses were performed against partial-least-squares (PLS) regressions on single-block pre-processed datasets and a methodology relying solely on PROSAC. The PROSAC-SO-PLS approach consistently outperformed these methods, yielding significantly lower prediction errors. This has been evidenced by a reduction in the root-mean-squared error of prediction (RMSEP) ranging from 5 to 25 % across seven out of the eight response variables analyzed.

Significance: The PROSAC-SO-PLS methodology offers a versatile and efficient technique for ensemble pre-processing in NIR data modeling. It enables the use of SO-PLS minimizing concerns about pre-processing sequence or block order and effectively manages a large number of data blocks. This innovation significantly streamlines the data pre-processing and model-building processes, enhancing the accuracy and efficiency of chemometric models.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.aca.2024.342965DOI Listing

Publication Analysis

Top Keywords

data blocks
16
data
8
chemometric models
8
response variables
8
large number
8
prosac-so-pls methodology
8
pre-processing
7
blocks
6
prediction
5
prosac selection
4

Similar Publications

Simulated metabolic profiles reveal biases in pathway analysis methods.

Metabolomics

September 2025

Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France.

Introduction: Initially developed for transcriptomics data, pathway analysis (PA) methods can introduce biases when applied to metabolomics data, especially if input parameters are not chosen with care. This is particularly true for exometabolomics data, where there can be many metabolic steps between the measured exported metabolites in the profile and internal disruptions in the organism. However, evaluating PA methods experimentally is practically impossible when the sample's "true" metabolic disruption is unknown.

View Article and Find Full Text PDF

Introduction: Pilots have an increased incidence of cutaneous melanoma compared to the general population; occupational exposure to ultraviolet (UV) radiation is one of several potential risk factors. Cockpit windshields effectively block UVB (280-315 nm) but further analysis is needed for UVA (315-400 nm). The objective of this observational study was to assess transmission of UVA through cockpit windshields and to measure doses of UVA at pilots' skin under daytime flying conditions.

View Article and Find Full Text PDF

Background: Cancer screening nonadherence persists among adults who are deaf, deafblind, and hard of hearing (DDBHH). These barriers span individual, clinician, and health care system levels, contributing to difficulties understanding cancer information, accessing screening services, and following treatment directives. Critical communication barriers include ineffective patient-physician communication, limited access to American Sign Language (ASL) cancer information, misconceptions about medical procedures, insurance navigation difficulties, and intersectional barriers for multiply marginalized individuals.

View Article and Find Full Text PDF

Background: Antimicrobial resistance (AMR) transmission is shaped by a complex interplay of health system factors, many of which remain underexplored or insufficiently addressed. This study investigates concrete systemic transmission drivers in hospitals and long-term care facilities (LTCFs) for older adults in Merseyside, UK.

Methods: Qualitative data were collected through semi-structured interviews with 37 purposively selected participants across hospitals, LTCFs, community settings, and ambulance services.

View Article and Find Full Text PDF

Introduction There is no definitive, comprehensive guide for diagnosing stuttering in multilingual speakers, and research suggests that monolingual-based diagnostic criteria may lead to misidentification in this population. This systematic review aimed to identify and consolidate conventional diagnostic guidelines for multilingual speakers and evaluate their validity in light of empirical evidence on stuttering and multilingualism. Method A systematic review was conducted using PubMed, Science Direct, SAGE, CINAHL, and Google Scholar using specific MESH terms (e.

View Article and Find Full Text PDF