Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In the universal quest to optimize machine-learning classifiers, three factors-model architecture, dataset size, and class balance-have been shown to influence test-time performance but do not fully account for it. Previously, evidence was presented for an additional factor that can be referred to as dataset quality, but it was unclear whether this was actually a joint property of the dataset and the model architecture, or an intrinsic property of the dataset itself. If quality is truly dataset-intrinsic and independent of model architecture, dataset size, and class balance, then the same datasets should perform better (or worse) regardless of these other factors. To test this hypothesis, here we create thousands of datasets, each controlled for size and class balance, and use them to train classifiers with a wide range of architectures, from random forests and support-vector machines to deep networks. We find that classifier performance correlates strongly by subset across architectures ( = 0.79), supporting quality as an intrinsic property of datasets independent of dataset size and class balance and of model architecture. Digging deeper, we find that dataset quality appears to be an emergent property of something more fundamental: the quality of datasets' constituent classes. Thus, quality joins size, class balance, and model architecture as an independent correlate of performance and a separate target for optimizing machine-learning-based classification.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12148058PMC

Publication Analysis

Top Keywords

size class
20
model architecture
16
class balance
16
dataset size
12
dataset quality
12
quality dataset-intrinsic
8
architecture dataset
8
property dataset
8
intrinsic property
8
balance model
8

Similar Publications

The Atlantification of the Arctic is driving a northward habitat shift of many cetaceans, including sperm whales (Physeter macrocephalus). As Arctic warming continues to decrease sea ice extent and contributes to the change in species distributions, it is crucial to study how the distribution patterns, habitat, and the demographic structure of sperm whale populations may continue to change. In this study, we assess the temporal presence of echolocating sperm whales on the continental slope southwest of the Svalbard archipelago and compare it with acoustic backscatter and temperature as a proxy for biomass.

View Article and Find Full Text PDF

Aims: Obesity is commonly hypothesized to lead to the development of heart failure (HF) in part due to increases in blood volume (BV) and left ventricular (LV) remodelling. Whether adiposity and obesity severity are associated with BV expansion and subsequent LV remodelling in middle-aged individuals at increased risk (IR) prior to the onset of HF is unknown.

Methods And Results: We analysed data from 96 middle-aged (40-64 years) non-obese (25.

View Article and Find Full Text PDF

Using high- and low-surface flatness fruits of Ziziphus jujuba Mill. cv. "Lingwuchangzao" at different developmental stages as test materials, this study examined the mechanisms underlying variations in fruit appearance and internal quality.

View Article and Find Full Text PDF

Novel development of lipid-based formulations: Improved wettability and homogeneous API solid dispersion visualised via near-infrared hyperspectral imaging.

Eur J Pharm Biopharm

September 2025

Research Center Pharmaceutical Engineering GmbH, Inffeldgasse 13, 8010 Graz, Austria; University of Graz, Institute of Pharmaceutical Sciences, Department of Pharmaceutical, Technology and Biopharmacy, Graz, Austria. Electronic address:

Lipid-based formulations have been successfully applied to improve the aqueous solubility of active pharmaceutical ingredients (APIs), however, with the bottleneck of limited wettability of the system. In this study, a lipid-based system was developed using polyglycerol ester of fatty acids (PGFA) as the main component and hexaglycerol (PG6) as a wetting agent. Felodipine, a BCS class II compound was selected as a model API.

View Article and Find Full Text PDF

Ionic Liquid Engineered Defect-Driven Green Emitting Zero-Dimensional CsPbBr Microdisks.

J Phys Chem Lett

September 2025

School of Chemical Sciences, National Institute of Science Education and Research (NISER), An OCC of Homi Bhabha National Institute Jatni, Khurda, Bhubaneswar 752050, Odisha, India.

Quantum-confined perovskites represent an emerging class of materials with great potential for optoelectronic applications. Specifically, zero-dimensional (0D) perovskites have garnered significant attention for their unique excitonic properties. However, achieving phase-pure, size-tunable 0D perovskite materials and gaining a clear understanding of their photophysical behavior remains challenging.

View Article and Find Full Text PDF