Related papers: A rigorous lower confidence bound for the expectat…
Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting…
We consider a linear regression model, with the parameter of interest a specified linear combination of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or…
Estimating the mode of a unimodal distribution is a classical problem in statistics. Although there are several approaches for point-estimation of mode in the literature, very little has been explored about the interval-estimation of mode.…
For a regression problem with a binary label response, we examine the problem of constructing confidence intervals for the label probability conditional on the features. In a setting where we do not have any information about the underlying…
We analyze the (unconditional) distribution of a linear predictor that is constructed after a data-driven model selection step in a linear regression model. First, we derive the exact finite-sample cumulative distribution function (cdf) of…
Consider the observation of n iid realizations of an experiment with d>1 possible outcomes, which corresponds to a single observation of a multinomial distribution M(n,p) where p is an unknown discrete distribution on {1,...,d}. In many…
Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider…
We consider the problem of distribution-free predictive inference, with the goal of producing predictive coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal…
We present a new method for constructing a confidence interval for the mean of a bounded random variable from samples of the random variable. We conjecture that the confidence interval has guaranteed coverage, i.e., that it contains the…
A novel, non-trivial, probabilistic upper bound on the entropy of an unknown one-dimensional distribution, given the support of the distribution and a sample from that distribution, is presented. No knowledge beyond the support of the…
The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of…
A confidence distribution is a complete tool for making frequentist inference for a parameter of interest $\psi$ based on an assumed parametric model. Indeed, it allows to reach point estimates, to assess their precision, to set up tests…
We describe a general method for the construction of a confidence region for the two parameters of the Negative Binomial Distribution. This is achieved by expanding the sampling distribution of Method-of-Moments estimators, using the…
The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that…
The statistics and machine learning communities have recently seen a growing interest in classification-based approaches to two-sample testing. The outcome of a classification-based two-sample test remains a rejection decision, which is not…
We consider the problem of constructing confidence intervals for the median of a response $Y \in \mathbb{R}$ conditional on features $X \in \mathbb{R}^d$ in a situation where we are not willing to make any assumption whatsoever on the…
We provide finite sample upper and lower bounds on the Binomial tail probability which are a direct application of Sanov's theorem. We then use these to obtain high probability upper and lower bounds on the minimum of i.i.d. Binomial random…
The non-convexity and intractability of distributionally robust chance constraints make them challenging to cope with. From a data-driven perspective, we propose formulating it as a robust optimization problem to ensure that the…
We review the methods of constructing confidence intervals that account for a priori information about one-sided constraints on the parameter being estimated. We show that the so-called method of sensitivity limit yields a correct solution…
While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus…