The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: A preliminary study.

Comput Biol Med

Department of Neuroscience, University of Padua, Padua, 35121, Italy; Padova Neuroscience Center, University of Padua, Padua, 35129, Italy; Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO Valais), Sierre, 3960, Switzerland. Electronic address: manfredo.atzor

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Deep learning is significantly advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns within the signals. Data partitioning and cross-validation are crucial for assessing model performance and ensuring study comparability, as they can produce varied results and data leakage due to specific signal properties (e.g., biometric). Such variability in model evaluation leads to incomparable studies and, increasingly, overestimated performance claims, which are detrimental to the field. Nevertheless, no comprehensive guidelines for proper data partitioning and cross-validation exist in the domain, nor is there a quantitative evaluation of the impact of different approaches on model accuracy, reliability, and generalizability. To assist researchers in identifying optimal experimental strategies, this paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models. Five cross-validation settings are compared across three supervised cross-subject classification tasks (brain-computer interfaces, Parkinson's, and Alzheimer's disease classification) and four established architectures of increasing complexity (ShallowConvNet, EEGNet, DeepConvNet, and Temporal-based ResNet). The comparison of over 100,000 trained models underscores, first, the importance of using subject-based cross-validation strategies for evaluating EEG deep learning architectures, except when within-subject analyses are acceptable (e.g., BCI). Second, it highlights the greater reliability of nested approaches (e.g., N-LNSO) compared to non-nested counterparts, which are prone to data leakage and favor larger models overfitting to validation data. In conclusion, this work provides EEG deep learning researchers with an analysis of data partitioning and cross-validation and offers guidelines to avoid data leakage, currently undermining the domain with potentially overestimated performance claims.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2025.110608DOI Listing

Publication Analysis

Top Keywords

data partitioning
20
deep learning
20
partitioning cross-validation
16
data leakage
12
eeg deep
12
data
9
role data
8
learning models
8
supervised cross-subject
8
overestimated performance
8

Similar Publications

Single-cell analysis of Barrett's esophagus and carcinoma reveals cell types conferring risk via genetic predisposition.

Cell Genom

September 2025

Institute of Pathology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany; Center for Molecular Medicine Cologne, University of Cologne, Cologne, Germany. Electronic address:

Inherited genetic variants contribute to Barrett's esophagus (BE) and esophageal adenocarcinoma (EAC), but it is unknown which cell types are involved in this process. We performed single-cell RNA sequencing of BE, EAC, and paired normal tissues and integrated genome-wide association data to determine cell-type-specific genetic risk and cellular processes that contribute to BE and EAC. The analysis reveals that EAC development is driven to a greater extent by local cellular processes than BE development and suggests that one cell type of BE origin (intestinal metaplasia cells) and cellular processes that control the differentiation of columnar cells are of particular relevance for EAC development.

View Article and Find Full Text PDF

Background: Tobacco use remains a major public health challenge in sub-Saharan Africa, with significant gendered dimensions. Place of residence is an important determinant, as rural and urban contexts shape exposure, access, and consumption patterns. This study investigates rural-urban disparities in tobacco use among women in sub-Saharan Africa, with a focus on quantifying the relative contributions of socioeconomic factors.

View Article and Find Full Text PDF

How many (distinguishable) classes can we identify in single-particle analysis?

Acta Crystallogr D Struct Biol

October 2025

Centro Nacional de Biotecnologia-CSIC, Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain.

Heterogeneity in cryoEM is essential for capturing the structural variability of macromolecules, reflecting their functional states and biological significance. However, estimating heterogeneity remains challenging due to particle misclassification and algorithmic biases, which can lead to reconstructions that blend distinct conformations or fail to resolve subtle differences. Furthermore, the low signal-to-noise ratio inherent in cryo-EM data makes it nearly impossible to detect minute structural changes, as noise often obscures subtle variations in macromolecular projections.

View Article and Find Full Text PDF

Ethnopharmacological Relevance: Moringa oleifera L. is widely used in Traditional Medicine across Africa and Asia for managing inflammation, infections, diabetes, and malnutrition. Although its aqueous and ethanolic extracts have been extensively studied, little is known about the safety of its non-polar (hexane) fraction, which may contain unique bioactive compounds.

View Article and Find Full Text PDF

Physicochemical Property Models for Poly- and Perfluorinated Alkyl Substances and Other Chemical Classes.

J Chem Inf Model

September 2025

United States Environmental Protection Agency, Center for Computational Toxicology and Exposure, 109 TW Alexander Dr., Research Triangle Park, North Carolina 27711, United States.

To assess environmental fate, transport, and exposure for PFAS (per- and polyfluoroalkyl substances), predictive models are needed to fill experimental data gaps for physicochemical properties. In this work, quantitative structure-property relationship (QSPR) models for octanol-water partition coefficient, water solubility, vapor pressure, boiling point, melting point, and Henry's law constant are presented. Over 200,000 experimental property value records were extracted from publicly available data sources.

View Article and Find Full Text PDF