radMLBench: A dataset collection for benchmarking in radiomics.

Comput Biol Med

Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, D-45147, Essen, Germany. Electronic address:

Published: November 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: New machine learning methods and techniques are frequently introduced in radiomics, but they are often tested on a single dataset, which makes it challenging to assess their true benefit. Currently, there is a lack of a larger, publicly accessible dataset collection on which such assessments could be performed. In this study, a collection of radiomics datasets with binary outcomes in tabular form was curated to allow benchmarking of machine learning methods and techniques.

Methods: A variety of journals and online sources were searched to identify tabular radiomics data with binary outcomes, which were then compiled into a homogeneous data collection that is easily accessible via Python. To illustrate the utility of the dataset collection, it was applied to investigate whether feature decorrelation prior to feature selection could improve predictive performance in a radiomics pipeline.

Results: A total of 50 radiomic datasets were collected, with sample sizes ranging from 51 to 969 and 101 to 11165 features. Using this data, it was observed that decorrelating features did not yield any significant improvement on average.

Conclusions: A large collection of datasets, easily accessible via Python, suitable for benchmarking and evaluating new machine learning techniques and methods was curated. Its utility was exemplified by demonstrating that feature decorrelation prior to feature selection does not, on average, lead to significant performance gains and could be omitted, thereby increasing the robustness and reliability of the radiomics pipeline.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2024.109140DOI Listing

Publication Analysis

Top Keywords

dataset collection
12
machine learning
12
learning methods
8
binary outcomes
8
easily accessible
8
accessible python
8
feature decorrelation
8
decorrelation prior
8
prior feature
8
feature selection
8

Similar Publications

Background: Assessing human movement is essential for diagnosing and monitoring movement-related conditions like neuromuscular disorders. Timed function tests (TFTs) are among the most widespread types of assessments due to their speed and simplicity, but they cannot capture disease-specific movement patterns. Conversely, biomechanical analysis can produce sensitive disease-specific biomarkers, but it is traditionally confined to laboratory settings.

View Article and Find Full Text PDF

Human factors are central to aviation safety, with pilot cognitive states such as workload, stress, and situation awareness playing important roles in flight performance and safety. Although flight simulators are widely used for training and scientific research, they often lack the ecological validity needed to replicate pilot cognitive states from real flights. To address these limitations, a new in-flight data collection methodology for general aviation using a Cessna 172 aircraft, which is one of the most widely used aircraft for pilot training, is presented.

View Article and Find Full Text PDF

Bipolar disorder (BD) is a debilitating mental illness characterized by significant mood swings, posing a substantial challenge for accurate diagnosis due to its clinical complexity. This paper presents CS2former, a novel approach leveraging a dual channel-spatial feature extraction module within a Transformer model to diagnose BD from resting-state functional MRI (Rs-fMRI) and T1-weighted MRI (T1w-MRI) data. CS2former employs a Channel-2D Spatial Feature Aggregation Module to decouple channel and spatial information from Rs-fMRI, while a Channel-3D Spatial Attention Module with Synchronized Attention Module (SAM) concurrently computes attention for T1w-MRI feature maps.

View Article and Find Full Text PDF

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

Background: Sexually transmitted infections are a significant public health concern, particularly in sub-Saharan Africa, where their prevalence remains high. Promoting awareness and reducing stigma are essential strategies for addressing this challenge, but those affected often have limited access to accurate and culturally appropriate health information. Therefore, innovative solutions are essential to enhance sexual health literacy and encourage informed health-seeking behaviors.

View Article and Find Full Text PDF