Related papers: Endogenous post-stratification in surveys: classif…
Auxiliary information can increase the efficiency of survey estimators through an assisting model when the model captures some of the relationship between the auxiliary data and the study variables. Despite their superior properties,…
In many randomized trials, outcomes such as essays or open-ended responses must be manually scored as a preliminary step to impact analysis, a process that is costly and limiting. Model-assisted estimation offers a way to combine surrogate…
To increase statistical efficiency in a randomized experiment, researchers often use stratification (i.e., blocking) in the design stage. However, conventional practices of stratification fail to exploit valuable information about the…
In observational surveys, post-stratification is used to reduce bias resulting from differences between the survey population and the population under investigation. However, this can lead to inflated post-stratification weights and,…
Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to…
Entropy is a measure of heterogeneity widely used in applied sciences, often when data are collected over space. Recently, a number of approaches has been proposed to include spatial information in entropy. The aim of entropy is to…
Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units…
A central theme in the field of survey statistics is estimating population-level quantities through data coming from potentially non-representative samples of the population. Multilevel Regression and Poststratification (MRP), a model-based…
A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…
The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…
The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…
In surveys requiring cost efficiency, such as medical research, measuring the variable of interest (e.g., disease status) is expensive and/or time-consuming; However, we often have access to easily attainable characteristics about sampling…
Sound policy and decision making in developing countries is often limited by the lack of timely and reliable data. Crowdsourced data may provide a valuable alternative for data collection and analysis, e. g. in remote and insecure areas or…
Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental…
This paper considers the problem of design-based inference for the average treatment effect in finely stratified experiments. Here, by "design-based'' we mean that the only source of uncertainty stems from the randomness in treatment…
The natural occurrence of singular spaces in applications has led to recent investigations on performing topological data analysis (TDA) in a stratified framework. In many applications, there is no a priori information on what points should…
The need for small area estimates is increasingly felt in both the public and private sectors in order to formulate their strategic plans. It is now widely recognized that direct small area survey estimates are highly unreliable owing to…
Observational data are often accompanied by natural structural indices, such as time stamps or geographic locations, which are meaningful to prediction tasks but are often discarded. We leverage semantically meaningful indexing data while…
Optimal propensity score matching has emerged as one of the most ubiquitous approaches for causal inference studies on observational data; However, outstanding critiques of the statistical properties of propensity score matching have cast…