Related papers: Inference with generalizable classifier prediction…
Conditional selective inference requires an exact characterization of the selection event, which is often unavailable except for a few examples like the lasso. This work addresses this challenge by introducing a generic approach to estimate…
Bootstrap is a useful tool for making statistical inference, but it may provide erroneous results under complex survey sampling. Most studies about bootstrap-based inference are developed under simple random sampling and stratified random…
Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human…
As the frontiers of applied statistics progress through increasingly complex experiments we must exploit increasingly sophisticated inferential models to analyze the observations we make. In order to avoid misleading or outright erroneous…
Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under…
We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection $\{(S_i,l_i)\}_{i=1}^n$, where each $S_i$ is a sample drawn from the probability distribution of $X_i…
For discrete-valued time series, predictive inference cannot be implemented through the construction of prediction intervals to some predetermined coverage level, as this is the case for real-valued time series. To address this problem, we…
With the growth in experimental studies in education, policymakers and practitioners are interested in understanding not only what works, but for whom an intervention works. This interest in the generalizability of a study's findings has…
We propose a new problem formulation which is similar to, but more informative than, the binary multiple-instance learning problem. In this setting, we are given groups of instances (described by feature vectors) along with estimates of the…
In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little…
Causal inference from observational data can be viewed as a missing data problem arising from a hypothetical population-scale randomized trial matched to the observational study. This links a target trial protocol with a corresponding…
Constructing tests or confidence regions that control over the error rates in the long-run is probably one of the most important problem in statistics. Yet, the theoretical justification for most methods in statistics is asymptotic. The…
With the increased interest in machine learning and big data problems, the need for large amounts of labelled data has also grown. However, it is often infeasible to get experts to label all of this data, which leads many practitioners to…
The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a…
In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we…
Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting…
We study Bayesian approaches to causal inference via propensity score regression. Much of the Bayesian literature on propensity score methods have relied on approaches that cannot be viewed as fully Bayesian in the context of conventional…
It is common to show the confidence intervals or $p$-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the…
In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…
To draw scientifically meaningful conclusions and build reliable models of quantitative phenomena, cause and effect must be taken into consideration (either implicitly or explicitly). This is particularly challenging when the measurements…