Related papers: Toward design-based inference for data integration
Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a…
We establish a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigorous procedure for estimating the propensity…
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…
The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the…
Integrating probability and non-probability samples is increasingly important, yet unknown sampling mechanisms in non-probability sources complicate identification and efficient estimation. We develop semiparametric theory for dual-frame…
With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or…
In this paper we study predictive mean matching mass imputation estimators to integrate data from probability and non-probability samples. We consider two approaches: matching predicted to predicted ($\hat{y}-\hat{y}$~matching; PMM A) and…
Design-based inference, also known as randomization-based or finite-population inference, provides a principled framework for trustworthy statistical inference by attributing randomness solely to the design mechanism (e.g., treatment…
Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to…
Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a…
To generalize inferences from a randomized trial to the target population of all trial-eligible individuals, investigators can use nested trial designs, where the randomized individuals are nested within a cohort of trial-eligible…
Missing data is an universal problem in statistics. We develop a unified framework for estimating parameters defined by general estimating equations under a missing-at-random (MAR) mechanism, based on generalized entropy calibration…
The use of big data in official statistics and the applied sciences is accelerating, but statistics computed using only big data often suffer from substantial selection bias. This leads to inaccurate estimation and invalid statistical…
The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian…
This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. In the initial stage of this experimental design, clusters (e.g., households, schools, or graph partitions) are stratified and randomly…
Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…
Causal inference on the average treatment effect (ATE) using non-probability samples, such as electronic health records (EHR), faces challenges from sample selection bias and high-dimensional covariates. This requires considering a…
Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature,…
We study moment-based estimation with two sequentially collected variables subject to non-monotone missingness. The commonly used Missing at Random (MAR) assumption requiring all missingness mechanisms to depend on the same fully observed…
It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical…