98%
921
2 minutes
20
Background: Leave-one-out cross-validation that fails to account for variable selection does not properly reflect prediction accuracy when the number of training sites is small. The impact on health effect estimates has rarely been studied. The objective of this study was to develop an improved validation procedure for land-use regression models with variable selection and investigate health effect estimates in relation to land-use regression model performance.
Methods: We randomly generated 10 training and test sets for nitrogen dioxide and particulate matter. For each training set, we developed models and evaluated them using a cross-holdout validation approach. Cross-holdout validation develops new models for each evaluation compared with refitting the model without variable selection, as in standard leave-one-out cross-validation. We also implemented holdout validation, which evaluates model predictions using independent test sets. We evaluated the relationship between cross-holdout validation and holdout validation R and estimates of the association between air pollution and forced vital capacity in the Dutch birth cohort.
Results: Cross-holdout validation Rs were generally identical to holdout validation Rs, but were notably smaller than leave-one-out cross-validation Rs. Decreases in forced vital capacity in relation to air pollution exposure were larger for land-use regression models that had larger holdout validation and cross-holdout validation Rs rather than leave-one-out cross-validation R.
Conclusion: Cross-holdout validation accurately reflects predictive ability of land-use regression models and is a useful validation approach for small datasets. Land-use regression predictive ability in terms of holdout validation and cross-holdout validation rather than leave-one-out cross-validation was associated with the magnitude of health effect estimates in a case study.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5221608 | PMC |
http://dx.doi.org/10.1097/EDE.0000000000000404 | DOI Listing |
Environ Pollut
November 2021
Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute, Socinstrasse 57, P.O.Box, 4002 Basel, Switzerland; University of Basel, Petersplatz 1, P. O. Box, 4001, Basel, Switzerland. Electronic address:
Background: Air pollution is a major global public health problem. The situation is most severe in low- and middle-income countries, where pollution control measures and monitoring systems are largely lacking. Data to quantify the exposure to air pollution in low-income settings are scarce.
View Article and Find Full Text PDFSci Total Environ
September 2018
Barcelona Institute for Global Health (ISGlobal), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.
Land-use regression (LUR) has been used to model local spatial variability of particulate matter in cities of high-income countries. Performance of LUR models is unknown in less urbanized areas of low-/middle-income countries (LMICs) experiencing complex sources of ambient air pollution and which typically have limited land use data. To address these concerns, we developed LUR models using satellite imagery (e.
View Article and Find Full Text PDFEpidemiology
January 2016
From the aInstitute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands; bDepartment of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA; cJulius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The N
Background: Leave-one-out cross-validation that fails to account for variable selection does not properly reflect prediction accuracy when the number of training sites is small. The impact on health effect estimates has rarely been studied. The objective of this study was to develop an improved validation procedure for land-use regression models with variable selection and investigate health effect estimates in relation to land-use regression model performance.
View Article and Find Full Text PDF