Related papers: Toward design-based inference for data integration

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a…

Methodology · Statistics 2019-08-26 Shu Yang , Jae Kwang Kim , Rui Song

Doubly Robust Inference with Non-probability Survey Samples

We establish a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigorous procedure for estimating the propensity…

Methodology · Statistics 2018-05-17 Yilin Chen , Pengfei Li , Changbao Wu

Model Assisted Data Integration: An unbiased sampling strategy to use nonprobability data

The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…

Methodology · Statistics 2026-04-02 Martin Hyllienmark , Gustaf Strandell

Data Integration by combining big data and survey sample data for finite population inference

The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the…

Methodology · Statistics 2020-06-19 Jae-kwang Kim , Siu-Ming Tam

Semiparametric Efficient Data Integration Using the Dual-Frame Sampling Framework

Integrating probability and non-probability samples is increasingly important, yet unknown sampling mechanisms in non-probability sources complicate identification and efficient estimation. We develop semiparametric theory for dual-frame…

Methodology · Statistics 2026-01-14 Kosuke Morikawa , Jae Kwang Kim

Robust Model-based Inference for Non-Probability Samples

With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or…

Methodology · Statistics 2022-04-08 Ali Rafei , Michael R. Elliott , Carol A. C. Flannagan

Data integration of non-probability and probability samples with predictive mean matching

In this paper we study predictive mean matching mass imputation estimators to integrate data from probability and non-probability samples. We consider two approaches: matching predicted to predicted ($\hat{y}-\hat{y}$~matching; PMM A) and…

Methodology · Statistics 2024-06-18 Piotr Chlebicki , Łukasz Chrostowski , Maciej Beręsewicz

Propensity Score Propagation: A General Framework for Design-Based Inference with Unknown Propensity Scores

Design-based inference, also known as randomization-based or finite-population inference, provides a principled framework for trustworthy statistical inference by attributing randomness solely to the design mechanism (e.g., treatment…

Methodology · Statistics 2026-04-17 Siyu Heng , Yanxin Shen , Zijian Guo

Pretest estimation in combining probability and non-probability samples

Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to…

Methodology · Statistics 2023-05-30 Chenyin Gao , Shu Yang

An integrated approach to test for missing not at random

Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a…

Methodology · Statistics 2022-12-08 Jack Noonan , Adetola Adedamola Adediran , Robin Mitra , Stefanie Biedermann

Generalizing trial findings using nested trial designs with sub-sampling of non-randomized individuals

To generalize inferences from a randomized trial to the target population of all trial-eligible individuals, investigators can use nested trial designs, where the randomized individuals are nested within a cohort of trial-eligible…

Methodology · Statistics 2019-03-08 Issa J. Dahabreh , Miguel A. Hernan , Sarah E. Robertson , Ashley Buchanan , Jon A. Steingrimsson

A Calibration Framework for Inference with Partially Observed Data

Missing data is an universal problem in statistics. We develop a unified framework for estimating parameters defined by general estimating equations under a missing-at-random (MAR) mechanism, based on generalized entropy calibration…

Methodology · Statistics 2026-03-31 Mst Moushumi Pervin , Hengfang Wang , Jae Kwang Kim

Survey Design and Estimating Equations when Combining Big Data with Probability Samples

The use of big data in official statistics and the applied sciences is accelerating, but statistics computed using only big data often suffer from substantial selection bias. This leads to inaccurate estimation and invalid statistical…

Methodology · Statistics 2023-08-11 Ryan Covey , Lucca Buonamano

Robust and Efficient Bayesian Inference for Non-Probability Samples

The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian…

Methodology · Statistics 2022-03-29 Ali Rafei , Michael R. Elliott , Carol A. C. Flannagan

Inference for Two-stage Experiments under Covariate-Adaptive Randomization

This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. In the initial stage of this experimental design, clusters (e.g., households, schools, or graph partitions) are stratified and randomly…

Econometrics · Economics 2026-01-16 Jizhou Liu

Probability and Non-Probability Samples: Improving Regression Modeling by Using Data from Different Sources

Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…

Methodology · Statistics 2022-04-05 Gerhard Tutz

Doubly robust causal inference through penalized bias-reduced estimation: combining non-probability samples with designed surveys

Causal inference on the average treatment effect (ATE) using non-probability samples, such as electronic health records (EHR), faces challenges from sample selection bias and high-dimensional covariates. This requires considering a…

Methodology · Statistics 2024-03-28 Jiacong Du , Xu Shi , Donglin Zeng , Bhramar Mukherjee

Statistical Inference with Nonignorable Non-Probability Survey Samples

Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature,…

Methodology · Statistics 2024-10-07 Yang Liu , Meng Yuan , Pengfei Li , Changbao Wu

A Doubly Robust GMM Estimator for Sequential Non-monotone Missingness

We study moment-based estimation with two sequentially collected variables subject to non-monotone missingness. The commonly used Missing at Random (MAR) assumption requiring all missingness mechanisms to depend on the same fully observed…

Econometrics · Economics 2026-05-29 Shenshen Yang

Bayesian Nonparametric Weighted Sampling Inference

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical…

Methodology · Statistics 2020-06-24 Yajuan Si , Natesh S. Pillai , Andrew Gelman