Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.

James D Hibbert , Angela D Liese , Andrew Lawson , Dwayne E Porter , Robin C Puett , Debra Standiford , Lenna Liu , Dana Dabelea

Int J Health Geogr

Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA.

Published: October 2009

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution).

Methods: We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic.

Results: At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003).

Conclusion: Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2763852	PMC
http://dx.doi.org/10.1186/1476-072X-8-54	DOI Listing

Publication Analysis

Top Keywords

imputation methods

census tracts

allocation methods

methods

geographic imputation

distribution cases

true distribution

individual level

fixed allocation

spatial distribution

Similar Publications

MOLECULE: Molecular-dynamics and Optimized deep Learning for Entropy-regularized Classification and Uncertainty-aware Ligand Evaluation.

J Chem Theory Comput

September 2025

Dipartimento di Chimica, Università di Pavia, Via Taramelli 12, Pavia 27100, Italy.

Ivan Cucchi , Elena Frasnetti , Francesco Frigerio , Fabrizio Cinquini , Silvia Pavoni

Machine learning (ML) and deep learning (DL) methodologies have significantly advanced drug discovery and design in several aspects. Additionally, the integration of structure-based data has proven to successfully support and improve the models' predictions. Indeed, we previously demonstrated that combining molecular dynamics (MD)-derived descriptors with ML models allows to effectively classify kinase ligands as allosteric or orthosteric.

View Article and Find Full Text PDF

Similar Publications

Biological Age Estimation From the Age Gap Using Deep Learning Integrating Morbidity and Mortality: Model Development and Validation Study.

J Med Internet Res

September 2025

Department of Internal Medicine, Seoul National University College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 0220721965.

Seong-Eun Moon , Ji Won Yoon , Jae Hyun Bae , Shinyoung Joo , Yoo Hyung Kim

Background: Biological age (BA) is increasingly recognized as a valuable alternative to chronological age (CA) for assessing an individual's health and aging status. However, existing models are based on limited clinical parameters and have not thoroughly integrated morbidity and mortality data.

Objective: This study aimed to develop and validate a novel transformer-based model, referred to as the BA - CA gap model, for BA estimation that incorporates morbidity and mortality information to improve predictive accuracy and enhance clinical use in the early identification of the risk of age-related diseases.

View Article and Find Full Text PDF

Similar Publications

Associations Between Food Insecurity and Child BMI: Cross-Sectional Versus Longitudinal Mediational Analysis of Maternal Weight-Related Parenting Practices and Concerns.

Matern Child Health J

September 2025

University of Southern California, 1845 N Soto St, Los Angeles, CA, 90032, USA.

Eleanor Shonkoff , Tyler Mason , Christine Naya , Genevieve F Dunton

Objective: To test whether parent restriction, pressure to eat, and maternal concern for child weight mediated the positive association between food insecurity and child body mass index (BMI) in cross-sectional and longitudinal analysis.

Methods: Data were from mother-child pairs (n = 202 at baseline). Children were M = 10.

View Article and Find Full Text PDF

Similar Publications

Spatial Difference-in-Differences with Bayesian Disease Mapping Models.

Epidemiology

September 2025

School of Public Health and Community Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden.

Carl Bonander , Marta Blangiardo , Ulf Strömberg

Bayesian disease-mapping models are widely used in small-area epidemiology to account for spatial correlation and stabilize estimates through spatial smoothing. In contrast, difference-in-differences (DID) methods-commonly used to estimate treatment effects from observational panel data-typically ignore spatial dependence. This paper integrates disease mapping models into an imputation-based DID framework to address spatially structured residual variation and improve precision in small-area evaluations.

View Article and Find Full Text PDF

Similar Publications

Surgical outcomes from haematoma evacuation for intracerebral haemorrhage in the INTERACT3 study.

Lancet Reg Health West Pac

September 2025

Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, China.

Xin Hu , Menglu Ouyang , Jianguo Xu , Yi Liu , Xi Li

Background: There is ongoing controversy as to whether surgical intervention to haematoma evacuation benefits patients with acute intracerebral haemorrhage (ICH). This study aimed to evaluate the association of surgical intervention to evacuate the haematoma and 6-month functional outcome in participants of the third Intensive Care Bundle with Blood Pressure Reduction in Acute Cerebral Haemorrhage Trial (INTERACT3).

Methods: This was a secondary analysis of INTERACT3, which enrolled adults (age ≥18 years) spontaneous ICH patients within 6 h after onset.

View Article and Find Full Text PDF

Similar Publications