Related papers: Statistical Classification via Robust Hypothesis T…
We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust…
Experiments often yield non-identically distributed data for statistical analysis. Tests of hypothesis under such set-ups are generally performed using the likelihood ratio test, which is non-robust with respect to outliers and model…
In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with…
This paper considers the problem of robust hypothesis testing under non-identically distributed data. We propose Wald-type tests for both simple and composite hypothesis for independent but non-homogeneous observations based on the robust…
Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case…
Model selection is a central task in statistics, but standard methods are not robust in misspecified settings where the true data-generating process (DGP) is not in the set of candidate models. The key limitation is that existing methods --…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…
We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between…
Robust tests of general composite hypothesis under non-identically distributed observations is always a challenge. Ghosh and Basu (2018, Statistica Sinica, 28, 1133--1155) have proposed a new class of test statistics for such problems based…
The problem of robust binary hypothesis testing is studied. Under both hypotheses, the data-generating distributions are assumed to belong to uncertainty sets constructed through moments; in particular, the sets contain distributions whose…
Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the…
When sample data are governed by an unknown sequence of independent but possibly non-identical distributions, the data-generating process (DGP) in general cannot be perfectly identified from the data. For making decisions facing such…
We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example,…
Currently, statistical tests for random number generators (RNGs) are widely used in practice, and some of them are even included in information security standards. But despite the popularity of RNGs, consistent tests are known only for…
Hypothesis testing in singular statistical models is often regarded as inherently problematic due to non-identifiability and degeneracy of the Fisher information. We show that the fundamental obstruction to testing in such models is not…
Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing,…
While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus…
Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow…
Randomly censored survival data are frequently encountered in applied sciences including biomedical or reliability applications and clinical trial analyses. Testing the significance of statistical hypotheses is crucial in such analyses to…
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts…