Missing data and prediction: the pattern submodel.

Biostatistics

Department of Biostatistics, Vanderbilt University, 2525West End, Suite 1100, Nashville, TN, USA.

Published: April 2020


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern submodels (PS)-a set of submodels for every missing data pattern that are fit using only data from that pattern-are a computationally efficient remedy for handling missing data at both stages. Here, we show that PS (i) retain their predictive accuracy even when the missing data mechanism is not missing at random (MAR) and (ii) yield an algorithm that is the most predictive among all standard missing data strategies. Specifically, we show that the expected loss of a forecasting algorithm is minimized when each pattern-specific loss is minimized. Simulations and a re-analysis of the SUPPORT study confirms that PS generally outperforms zero-imputation, mean-imputation, complete-case analysis, complete-case submodels, and even multiple imputation (MI). The degree of improvement is highly dependent on the missingness mechanism and the effect size of missing predictors. When the data are MAR, MI can yield comparable forecasting performance but generally requires a larger computational cost. We also show that predictions from the PS approach are equivalent to the limiting predictions for a MI procedure that is dependent on missingness indicators (the MIMI model). The focus of this article is on out-of-sample prediction; implications for model inference are only briefly explored.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7868046PMC
http://dx.doi.org/10.1093/biostatistics/kxy040DOI Listing

Publication Analysis

Top Keywords

missing data
24
missing
8
mar yield
8
dependent missingness
8
data
7
data prediction
4
prediction pattern
4
pattern submodel
4
submodel missing
4
data common
4

Similar Publications

Surgical outcomes from haematoma evacuation for intracerebral haemorrhage in the INTERACT3 study.

Lancet Reg Health West Pac

September 2025

Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, China.

Background: There is ongoing controversy as to whether surgical intervention to haematoma evacuation benefits patients with acute intracerebral haemorrhage (ICH). This study aimed to evaluate the association of surgical intervention to evacuate the haematoma and 6-month functional outcome in participants of the third Intensive Care Bundle with Blood Pressure Reduction in Acute Cerebral Haemorrhage Trial (INTERACT3).

Methods: This was a secondary analysis of INTERACT3, which enrolled adults (age ≥18 years) spontaneous ICH patients within 6 h after onset.

View Article and Find Full Text PDF

Objective: The risk of lymph node metastasis significantly influences the choice of surgical strategy for patients with early-stage endometrial cancer. While sentinel lymph node dissection can be considered in clinically early-stage endometrial cancer, lymph node evaluation might be omitted in patients with very low risk of lymph node metastasis. This study aims to develop a predicting model for lymph node metastasis in these patients, identifying potential metastases as thoroughly as possible to provide clinicians with a preoperative reference that helps in decisions about surgical procedures and treatments.

View Article and Find Full Text PDF

Introduction: Organizational resilience is of paramount importance for coping with adversity, particularly in the healthcare sector during crises. The objective of the present study was to evaluate the impact of resilience-based interventions on the well-being of healthcare employees during the pandemic. In this study, resilience-based interventions are defined as organizational actions that strengthen a healthcare institution's capacity to cope with crises-such as ensuring adequate personal protective equipment and staff testing, clear risk-communication, alternative care pathways (e.

View Article and Find Full Text PDF

Arenaviruses and Hantaviruses, primarily hosted by rodents and shrews, represent significant public health threats due to their potential for zoonotic spillover into human populations. Despite their global distribution, the full impact of these viruses on human health remains poorly understood, particularly in regions like Africa, where data is sparse. Both virus families continue to emerge, with pathogen evolution and spillover driven by anthropogenic factors such as land use change, climate change, and biodiversity loss.

View Article and Find Full Text PDF

Background: Four-dimensional magnetic resonance imaging (4D-MRI) holds great promise for precise abdominal radiotherapy guidance. However, current 4D-MRI methods are limited by an inherent trade-off between spatial and temporal resolutions, resulting in compromised image quality characterized by low spatial resolution and significant motion artifacts, hindering clinical implementation. Despite recent advancements, existing methods inadequately exploit redundant frame information and struggle to restore structural details from highly undersampled acquisitions.

View Article and Find Full Text PDF