Related papers: Generalized Error Exponents For Small Sample Unive…
The asymptotically optimal hypothesis testing problem with the general sources as the null and alternative hypotheses is studied under exponential-type error constraints on the first kind of error probability. Our fundamental philosophy in…
In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions,…
Pearson's chi-squared test, from 1900, is the standard statistical tool for "hypothesis testing on distributions": namely, given samples from an unknown distribution $Q$ that may or may not equal a hypothesis distribution $P$, we want to…
We consider Bayesian multiple hypothesis problem with independent and identically distributed observations. The classical, Sanov's theorem-based, analysis of the error probability allows one to characterize the best achievable error…
This work performs a non-asymptotic analysis of the generalized Lasso under the assumption of sub-exponential data. Our main results continue recent research on the benchmark case of (sub-)Gaussian sample distributions and thereby explore…
Universal outlier hypothesis testing refers to a hypothesis testing problem where one observes a large number of length-$n$ sequences -- the majority of which are distributed according to the typical distribution $\pi$ and a small number…
We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case…
We address the issue of performing testing inference in generalized linear models when the sample size is small. This class of models provides a straightforward way of modeling normal and non-normal data and has been widely used in several…
The task of the binary classification problem is to determine which of two distributions has generated a length-$n$ test sequence. The two distributions are unknown; two training sequences of length $N$, one from each distribution, are…
In this paper, we revisit the classical goodness-of-fit problems for univariate distributions; we propose a new testing procedure based on a characterisation of the uniform distribution. Asymptotic theory for the simple hypothesis case is…
Outlier hypothesis testing is studied in a universal setting. Multiple sequences of observations are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are distributed…
Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require…
We consider linear regression in the high-dimensional regime where the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the…
A generic out-of-sample error estimate is proposed for robust $M$-estimators regularized with a convex penalty in high-dimensional linear regression where $(X,y)$ is observed and $p,n$ are of the same order. If $\psi$ is the derivative of…
The recently introduced framework of universal inference provides a new approach to constructing hypothesis tests and confidence regions that are valid in finite samples and do not rely on any specific regularity assumptions on the…
The classical asymptotic theory for parametric $M$-estimators guarantees that, in the limit of infinite sample size, the excess risk has a chi-square type distribution, even in the misspecified case. We demonstrate how self-concordance of…
We study the problem of testing the goodness of fit of categorical count data to a Poisson distribution uniform over the categories, against a class of alternatives defined by excluding an $\ell_p$ ball, $p \leq 2$, of radius $\epsilon$…
Testing hypothesis of independence between two random elements on a joint alphabet is a fundamental exercise in statistics. Pearson's chi-squared test is an effective test for such a situation when the contingency table is relatively small.…
The classical binary hypothesis testing problem is revisited. We notice that when one of the hypotheses is composite, there is an inherent difficulty in defining an optimality criterion that is both informative and well-justified. For…
The average properties of the well-known Subset Sum Problem can be studied by the means of its randomised version, where we are given a target value $z$, random variables $X_1, \ldots, X_n$, and an error parameter $\varepsilon > 0$, and we…