Related papers: Semiparametric Efficient Data Integration Using th…
This paper considers an empirical likelihood inference for parameters defined by general estimating equations, when data are missing at random. The efficiency of existing estimators depends critically on correctly specifying the conditional…
Nonresponse after probability sampling is a universal challenge in survey sampling, often necessitating adjustments to mitigate sampling and selection bias simultaneously. This study explored the removal of bias and effective utilization of…
Many statistical estimands of interest (e.g., in regression or causality) are functions of the joint distribution of multiple random variables. But in some applications, data is not available that measures all random variables on each…
During the past few decades, missing-data problems have been studied extensively, with a focus on the ignorable missing case, where the missing probability depends only on observable quantities. By contrast, research into non-ignorable…
We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the…
Sample selection is pervasive in applied economic studies. This paper develops semiparametric selection models that achieve point identification without relying on exclusion restrictions, an assumption long believed necessary for…
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts…
In this paper we study predictive mean matching mass imputation estimators to integrate data from probability and non-probability samples. We consider two approaches: matching predicted to predicted ($\hat{y}-\hat{y}$~matching; PMM A) and…
We consider statistical inference for a finite-dimensional parameter in a regular semiparametric model under a distributed setting with blockwise missingness, where entire blocks of variables are unavailable at certain sites and sharing…
In this review we cover the basics of efficient nonparametric parameter estimation (also called functional estimation), with a focus on parameters that arise in causal inference problems. We review both efficiency bounds (i.e., what is the…
We consider the efficient estimation of the semiparametric additive transformation model with current status data. A wide range of survival models and econometric models can be incorporated into this general transformation framework. We…
Valid statistical inference is challenging when the sample is subject to unknown selection bias. Data integration can be used to correct for selection bias when we have a parallel probability sample from the same population with some common…
We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset consisting of pairs $\{X_i, Y_i \}_{i=1}^n$ and an unlabeled dataset $\{ X_i \}_{i=n+1}^{n+N}$. We ask the question: under what…
Data analysis based on information from several sources is common in economic and biomedical studies. This setting is often referred to as the data fusion problem, which differs from traditional missing data problems since no complete data…
We study the identification and estimation of statistical functionals of multivariate data missing non-monotonically and not-at-random, taking a semiparametric approach. Specifically, we assume that the missingness mechanism satisfies what…
Suppose we have individual data from an internal study and various summary statistics from relevant external studies. External summary statistics have the potential to improve statistical inference for the internal population; however, it…
Combining information from multiple samples is often needed in biomedical and economic studies, but the differences between these samples must be appropriately taken into account in the analysis of the combined data. We study estimation for…
We propose a semiparametric data fusion framework for efficient inference on survival probabilities by integrating right-censored and current status data. Existing data fusion methods focus largely on fusing right-censored data only, while…
Integrating non-probability samples into finite-population inference typically requires modeling unknown selection probabilities under a missing-at-random (MAR) assumption that is difficult to verify. We propose a design-based alternative…
We propose nonparametric identification and semiparametric estimation of joint potential outcome distributions in the presence of confounding. First, in settings with observed confounding, we derive tighter, covariate-informed bounds on the…