English
Related papers

Related papers: Toward design-based inference for data integration

200 papers

Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a…

Methodology · Statistics 2019-08-26 Shu Yang , Jae Kwang Kim , Rui Song

We establish a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigorous procedure for estimating the propensity…

Methodology · Statistics 2018-05-17 Yilin Chen , Pengfei Li , Changbao Wu

The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…

Methodology · Statistics 2026-04-02 Martin Hyllienmark , Gustaf Strandell

The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the…

Methodology · Statistics 2020-06-19 Jae-kwang Kim , Siu-Ming Tam

Integrating probability and non-probability samples is increasingly important, yet unknown sampling mechanisms in non-probability sources complicate identification and efficient estimation. We develop semiparametric theory for dual-frame…

Methodology · Statistics 2026-01-14 Kosuke Morikawa , Jae Kwang Kim

With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or…

Methodology · Statistics 2022-04-08 Ali Rafei , Michael R. Elliott , Carol A. C. Flannagan

In this paper we study predictive mean matching mass imputation estimators to integrate data from probability and non-probability samples. We consider two approaches: matching predicted to predicted ($\hat{y}-\hat{y}$~matching; PMM A) and…

Methodology · Statistics 2024-06-18 Piotr Chlebicki , Łukasz Chrostowski , Maciej Beręsewicz

Design-based inference, also known as randomization-based or finite-population inference, provides a principled framework for trustworthy statistical inference by attributing randomness solely to the design mechanism (e.g., treatment…

Methodology · Statistics 2026-04-17 Siyu Heng , Yanxin Shen , Zijian Guo

Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to…

Methodology · Statistics 2023-05-30 Chenyin Gao , Shu Yang

Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a…

Methodology · Statistics 2022-12-08 Jack Noonan , Adetola Adedamola Adediran , Robin Mitra , Stefanie Biedermann

To generalize inferences from a randomized trial to the target population of all trial-eligible individuals, investigators can use nested trial designs, where the randomized individuals are nested within a cohort of trial-eligible…

Missing data is an universal problem in statistics. We develop a unified framework for estimating parameters defined by general estimating equations under a missing-at-random (MAR) mechanism, based on generalized entropy calibration…

Methodology · Statistics 2026-03-31 Mst Moushumi Pervin , Hengfang Wang , Jae Kwang Kim

The use of big data in official statistics and the applied sciences is accelerating, but statistics computed using only big data often suffer from substantial selection bias. This leads to inaccurate estimation and invalid statistical…

Methodology · Statistics 2023-08-11 Ryan Covey , Lucca Buonamano

The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian…

Methodology · Statistics 2022-03-29 Ali Rafei , Michael R. Elliott , Carol A. C. Flannagan

This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. In the initial stage of this experimental design, clusters (e.g., households, schools, or graph partitions) are stratified and randomly…

Econometrics · Economics 2026-01-16 Jizhou Liu

Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…

Methodology · Statistics 2022-04-05 Gerhard Tutz

Causal inference on the average treatment effect (ATE) using non-probability samples, such as electronic health records (EHR), faces challenges from sample selection bias and high-dimensional covariates. This requires considering a…

Methodology · Statistics 2024-03-28 Jiacong Du , Xu Shi , Donglin Zeng , Bhramar Mukherjee

Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature,…

Methodology · Statistics 2024-10-07 Yang Liu , Meng Yuan , Pengfei Li , Changbao Wu

We study moment-based estimation with two sequentially collected variables subject to non-monotone missingness. The commonly used Missing at Random (MAR) assumption requiring all missingness mechanisms to depend on the same fully observed…

Econometrics · Economics 2026-05-29 Shenshen Yang

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical…

Methodology · Statistics 2020-06-24 Yajuan Si , Natesh S. Pillai , Andrew Gelman
‹ Prev 1 2 3 10 Next ›