Related papers: Selective inference with a randomized response
We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection…
We review approaches to statistical inference based on randomization. Permutation tests are treated as an important special case. Under a certain group invariance property, referred to as the ``randomization hypothesis,'' randomization…
Selective inference aims at providing valid inference after a data-driven selection of models or hypotheses. It is essential to avoid overconfident results and replicability issues. While significant advances have been made in this area for…
Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…
As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate…
In this paper we introduce randomized $t$-type statistics that will be referred to as randomized pivots. We show that these randomized pivots yield central limit theorems with a significantly smaller magnitude of error as compared to that…
Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all…
We consider the problem of inference in shift-share research designs. The choice between existing approaches that allow for unrestricted spatial correlation involves tradeoffs, varying in terms of their validity when there are relatively…
Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for…
Selective inference is a subfield of statistics that enables valid inference after selection of a data-dependent question. In this paper, we introduce selectively dominant p-values, a class of p-values that allow practitioners to easily…
We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven (sub)collection of hypotheses is chosen from some large universe of hypotheses.…
External controls from historical trials or observational data can augment randomized controlled trials when large-scale randomization is impractical or unethical, such as in drug evaluation for rare diseases. However, non-randomized…
Frequentists' inference often delivers point estimators associated with confidence intervals or sets for parameters of interest. Constructing the confidence intervals or sets requires understanding the sampling distributions of the point…
In modern data analysis, it is common to select a model before performing statistical inference. Selective inference tools make adjustments for the model selection process in order to ensure reliable inference post selection. In this paper,…
Randomization testing is a fundamental method in statistics, enabling inferential tasks such as testing for (conditional) independence of random variables, constructing confidence intervals in semiparametric location models, and…
In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we…
We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our…
As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure…
Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may…
We model stochastic choices with categorization. The agent preliminarly groups alternatives in homogenous disjoint classes, then randomly chooses one class and randomly picks an item within the selected class. We give a formal definition of…