Related papers: Estimating prediction error for complex samples
The Horvitz-Thompson (HT) estimator is widely used in survey sampling. However, the variance of the HT estimator becomes large when the inclusion probabilities are highly heterogeneous. To overcome this shortcoming, in this paper, a…
In survey sampling, survey data do not necessarily represent the target population, and the samples are often biased. However, information on the survey weights aids in the elimination of selection bias. The Horvitz-Thompson estimator is a…
The cross-classified sampling design consists in drawing samples from a two-dimension population, independently in each dimension. Such design is commonly used in consumer price index surveys and has been recently applied to draw a sample…
We elaborate on a deconvolution method, used to estimate the empirical distribution of unknown parameters, as suggested recently by Efron (2013). It is applied to estimating the empirical distribution of the 'sampling probabilities' of m…
Randomized controlled trials often suffer from interference, a violation of the Stable Unit Treatment Values Assumption (SUTVA) in which a unit's treatment assignment affects the outcomes of its neighbors. This interference causes bias in…
The generalization ability of minimizers of the empirical risk in the context of binary classification has been investigated under a wide variety of complexity assumptions for the collection of classifiers over which optimization is…
An additive model-assisted nonparametric method is investigated to estimate the finite population totals of massive survey data with the aid of auxiliary information. A class of estimators is proposed to improve the precision of the well…
Accurate heterogeneous treatment effect (HTE) estimation is essential for personalized recommendations, making it important to evaluate and compare HTE estimators. Traditional assessment methods are inapplicable due to missing…
Bipartite incidence graph sampling provides a unified representation of many sampling situations for the purpose of estimation, including the existing unconventional sampling methods, such as indirect, network or adaptive cluster sampling,…
The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work…
Under the Neyman causal model, it is well-known that OLS with treatment-by-covariate interactions cannot harm asymptotic precision of estimated treatment effects in completely randomized experiments. But do such guarantees extend to…
In this paper, we develop a multiply robust inference procedure of the average treatment effect (ATE) for data with high-dimensional covariates. We consider the case where it is difficult to correctly specify a single parametric model for…
Bayesian estimation is increasingly popular for performing model based inference to support policymaking. These data are often collected from surveys under informative sampling designs where subject inclusion probabilities are designed to…
We systematically investigate issues due to mis-specification that arise in estimating causal effects when (treatment) interference is informed by a network available pre-intervention, i.e., in situations where the outcome of a unit may…
We study the problem of selecting the best heterogeneous treatment effect (HTE) estimator from a collection of candidates in settings where the treatment effect is fundamentally unobserved. We cast estimator selection as a multiple testing…
We propose an estimator of prediction error using an approximate message passing (AMP) algorithm that can be applied to a broad range of sparse penalties. Following Stein's lemma, the estimator of the generalized degrees of freedom, which…
Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the…
The Horvitz-Thompson (H-T) estimator is widely used for estimating network causal effects. We study its optimality properties by embedding it in the class of all linear estimators. We show that, under any form of interference, the H-T…
Current methods for population mean estimation from data collected by Respondent Driven Sampling (RDS) are based on the Horvitz-Thompson estimator together with a set of assumptions on the sampling model under which the inclusion…
Capture-recapture methods for estimating the total size of elusive populations are widely-used, however, due to the choice of estimator impacting upon the results and conclusions made, the question of performance of each estimator is…