Importance of sample size on the quality and utility of AI-based prediction models for healthcare.

Richard D Riley , Joie Ensor , Kym I E Snell , Lucinda Archer , Rebecca Whittle , Paula Dhiman , Joseph Alderman , Xiaoxuan Liu , Laura Kirton , Jay Manson-Whitton , Maarten van Smeden , Karel G Moons , Krishnarajah Nirantharakumar , Jean-Baptiste Cazier , Alastair K Denniston , Ben Van Calster , Gary S Collins

Lancet Digit Health

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Rigorous study design and analytical standards are required to generate reliable findings in healthcare from artificial intelligence (AI) research. One crucial but often overlooked aspect is the determination of appropriate sample sizes for studies developing AI-based prediction models for individual diagnosis or prognosis. Specifically, the number of participants and outcome events required in datasets for model training and evaluation remains inadequately addressed. Most AI studies do not provide a rationale for their chosen sample sizes and frequently rely on datasets that are inadequate for training or evaluating a clinical prediction model. Among the ten principles of Good Machine Learning Practice established by the US Food and Drug Administration, the UK Medicines and Healthcare products Regulatory Agency, and Health Canada, guidance on sample size is directly relevant to at least three principles. To reinforce this recommendation, we outline seven reasons why inadequate sample size negatively affects model training, evaluation, and performance. Using a range of examples, we illustrate these issues and discuss the potentially harmful consequences for patient care and clinical adoption. Additionally, we address challenges associated with increasing sample sizes in AI research and highlight existing approaches and software for calculating the minimum sample sizes required for model training and evaluation.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.landig.2025.01.013	DOI Listing

Publication Analysis

Top Keywords

sample sizes

sample size

model training

training evaluation

ai-based prediction

prediction models

sample

size quality

quality utility

utility ai-based

Similar Publications

Comparative Safety of JAK Inhibitors vs TNF Antagonists in Immune-Mediated Inflammatory Diseases: A Systematic Review and Meta-Analysis.

JAMA Netw Open

September 2025

Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla.

Virginia Solitano , Dhruv Ahuja , Han Hee Lee , Ritu Gaikwad , Kuan-Hung Yeh

Importance: Janus kinase (JAK) inhibitors are highly effective medications for several immune-mediated inflammatory diseases (IMIDs). However, safety concerns have led to regulatory restrictions.

Objective: To compare the risk of adverse events with JAK inhibitors vs tumor necrosis factor (TNF) antagonists in patients with IMIDs in head-to-head comparative effectiveness studies.

View Article and Find Full Text PDF

Similar Publications

From Evidence to Bedside: The Role of Short-Acting β-Blockers in Sepsis Care.

Cardiol Rev

September 2025

Departments of Cardiology and Medicine, Westchester Medical Center and New York Medical College, Valhalla, NY.

Amogh Jyothi Arun , Bhavika Darji , Madiha Baig , William H Frishman , Wilbert S Aronow

Sepsis remains a leading cause of critical illness and mortality worldwide, driven by a dysregulated host response to infection and often complicated by persistent tachycardia and cardiovascular dysfunction. Increasing evidence implicates excessive sympathetic activation as a contributor to sepsis-related hemodynamic instability and myocardial injury, prompting growing interest in the use of β-adrenergic blockade as a therapeutic adjunct. This review synthesizes current data on the safety and efficacy of short-acting, cardioselective β-blockers (BBs), particularly esmolol and landiolol, in septic shock.

View Article and Find Full Text PDF

Similar Publications

Predicting Remission in Schizophrenia Using Machine Learning-Assessing the Impact of Sample Size and Predictor Overinclusion.

Acta Psychiatr Scand

September 2025

Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.

Fredrik Hieronymus , Magnus Hieronymus , Axel Sjöstedt , Staffan Nilsson , Jakob Näslund

Introduction: Machine learning studies sometimes include a high number of predictors relative to the number of training cases. This increases the risk of overfitting and poor generalizability. A recent study hypothesized that between-trial heterogeneity precluded generalizable outcome prediction in schizophrenia from being achieved.

View Article and Find Full Text PDF

Similar Publications

Composite endpoints in health technology assessment: Part 1 - an illustration of best modeling practice.

J Comp Eff Res

September 2025

British Heart Foundation, University of Glasgow, Glasgow, UK.

Andrew Briggs , Aris Angelis , Jieling Chen , David Booth , Jason A Davis

Composite endpoints amalgamate multiple clinical outcomes into a single measure, offering efficiency gains in clinical trials through increased event rates and reduced sample sizes, thus accelerating clinical development and regulatory approval. However, employing composite endpoints introduces complexities into health technology assessments (HTAs), particularly in economic modeling, due to the varying clinical significance and cost implications of the components. In this paper, we explore best modeling practice for HTAs that are based on clinical trials that employ composite endpoints.

View Article and Find Full Text PDF

Similar Publications

Prognostic models for radiation-induced complications after radiotherapy in head and neck cancer patients.

Cochrane Database Syst Rev

September 2025

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands.

Toshihiko Takada , Makbule Tambas , Enrico Clementel , Artuur Leeuwenberg , Marjan Sharabiani

Background: Radiotherapy is the mainstay of treatment for head and neck cancer (HNC) but may induce various side effects on surrounding normal tissues. To reach an optimal balance between tumour control and toxicity prevention, normal tissue complication probability (NTCP) models have been reported to predict the risk of radiation-induced side effects in patients with HNC. However, the quality of study design, conduct, and analysis (i.

View Article and Find Full Text PDF

Similar Publications