J Surv Stat Methodol
February 2023
J Off Stat
September 2021
A non-probability sampling mechanism arising from non-response or non-selection is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is 'non-ignorable', i.e.
View Article and Find Full Text PDFRandomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time , conditional on the prior history, differs across the patterns of missing data.
View Article and Find Full Text PDFMissing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. Importantly, the lack of transparency around methodological decisions is threatening the validity and reproducibility of modern research.
View Article and Find Full Text PDFWe consider comparative effectiveness research (CER) from observational data with two or more treatments. In observational studies, the estimation of causal effects is prone to bias due to confounders related to both treatment and outcome. Methods based on propensity scores are routinely used to correct for such confounding biases.
View Article and Find Full Text PDFJ Surv Stat Methodol
November 2020
With the current focus of survey researchers on "big data" that are not selected by probability sampling, measures of the degree of potential sampling bias arising from this nonrandom selection are sorely needed. Existing indices of this degree of departure from probability sampling, like the R-indicator, are based on functions of the propensity of inclusion in the sample, estimated by modeling the inclusion probability as a function of auxiliary variables. These methods are agnostic about the relationship between the inclusion probability and survey outcomes, which is a crucial feature of the problem.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
November 2019
Rising costs of survey data collection and declining response rates have caused researchers to turn to non-probability samples to make descriptive statements about populations. However, unlike probability samples, non-probability samples may produce severely biased descriptive estimates due to selection bias. The paper develops and evaluates a simple model-based index of the potential selection bias in estimates of population proportions due to non-ignorable selection mechanisms.
View Article and Find Full Text PDFJ Surv Stat Methodol
September 2019
The most widespread method of computing confidence intervals (CIs) in complex surveys is to add and subtract the margin of error (MOE) from the point estimate, where the MOE is the estimated standard error multiplied by the suitable Gaussian quantile. This Wald-type interval is used by the American Community Survey (ACS), the largest US household sample survey. For inferences on small proportions with moderate sample sizes, this method often results in marked under-coverage and lower CI endpoint less than 0.
View Article and Find Full Text PDFAccidents are a leading cause of deaths in U.S. active duty personnel.
View Article and Find Full Text PDFA case study is presented assessing the impact of missing data on the analysis of daily diary data from a study evaluating the effect of a drug for the treatment of insomnia. The primary analysis averaged daily diary values for each patient into a weekly variable. Following the commonly used approach, missing daily values within a week were ignored provided there was a minimum number of diary reports (i.
View Article and Find Full Text PDFObjectives: To recommend methodological standards in the prevention and handling of missing data for primary patient-centered outcomes research (PCOR).
Study Design And Setting: We searched National Library of Medicine Bookshelf and Catalog as well as regulatory agencies' and organizations' Web sites in January 2012 for guidance documents that had formal recommendations regarding missing data. We extracted the characteristics of included guidance documents and recommendations.
Surv Methodol
December 2012
This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function.
View Article and Find Full Text PDFGene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
May 2010
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al.
View Article and Find Full Text PDFIn clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
November 2010
J Med Internet Res
November 2010
Background: The Internet provides us with tools (user metrics or paradata) to evaluate how users interact with online interventions. Analysis of these paradata can lead to design improvements.
Objective: The objective was to explore the qualities of online participant engagement in an online intervention.
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited.
View Article and Find Full Text PDFWe propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling.
View Article and Find Full Text PDFBayesian Anal
January 2010
This work is motivated by a quantitative Magnetic Resonance Imaging study of the relative change in tumor vascular permeability during the course of radiation therapy. The differences in tumor and healthy brain tissue physiology and pathology constitute a notable feature of the image data-spatial heterogeneity with respect to its contrast uptake profile (a surrogate for permeability) and radiation induced changes in this profile. To account for these spatial aspects of the data, we employ a Gaussian hidden Markov random field (MRF) model.
View Article and Find Full Text PDFHot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a "similar" unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set.
View Article and Find Full Text PDFJ Sch Health
February 2010
Background: Asthma is a serious problem for low-income preteens living in disadvantaged communities. Among the chronic diseases of childhood and adolescence, asthma has the highest prevalence and related health care use. School-based asthma interventions have proven successful for older and younger students, but results have not been demonstrated for those in middle school.
View Article and Find Full Text PDFBackground: The goal of the present study was to quantify the population-based background serum concentrations of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) by using data from the reference population of the 2005 University of Michigan Dioxin Exposure Study (UMDES) and the 2003-2004 National Health and Nutrition Examination Survey (NHANES).
Methods: Multiple imputation was used to impute the serum TCDD concentrations below the limit of detection by combining the 2 data sources. The background mean, quartiles, and 95th percentile serum TCDD concentrations were estimated by age and sex by using linear and quantile regressions for complex survey data.
Selection models and pattern-mixture models are often used to deal with nonignorable dropout in longitudinal studies. These two classes of models are based on different factorizations of the joint distribution of the outcome process and the dropout process. We consider a new class of models, called mixed-effect hybrid models (MEHMs), where the joint distribution of the outcome process and dropout process is factorized into the marginal distribution of random effects, the dropout process conditional on random effects, and the outcome process conditional on dropout patterns and random effects.
View Article and Find Full Text PDFConsider a meta-analysis of studies with varying proportions of patient-level missing data, and assume that each primary study has made certain missing data adjustments so that the reported estimates of treatment effect size and variance are valid. These estimates of treatment effects can be combined across studies by standard meta-analytic methods, employing a random-effects model to account for heterogeneity across studies. However, we note that a meta-analysis based on the standard random-effects model will lead to biased estimates when the attrition rates of primary studies depend on the size of the underlying study-level treatment effect.
View Article and Find Full Text PDF