Related papers: Statistical Hypothesis Testing Based on Machine Le…
We study the performance of machine learning binary classification techniques in terms of error probabilities. The statistical test is based on the Data-Driven Decision Function (D3F), learned in the training phase, i.e., what is…
Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing,…
The use of machine learning methods for predictive purposes has increased dramatically over the past two decades, but uncertainty quantification for predictive comparisons remains elusive. This paper addresses this gap by extending the…
This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the…
Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by…
Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the…
We study the properties of several likelihood-based statistics commonly used in testing for the presence of a known signal under a mixture model with known background, but unknown signal fraction. Under the null hypothesis of no signal, all…
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data…
A common approach to statistical learning with big-data is to randomly split it among $m$ machines and learn the parameter of interest by averaging the $m$ individual estimates. In this paper, focusing on empirical risk minimization, or…
We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust…
The theory of stochastic approximations form the theoretical foundation for studying convergence properties of many popular recursive learning algorithms in statistics, machine learning and statistical physics. Large deviations for…
Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there…
Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical…
In statistical classification and machine learning, classification error is an important performance measure, which is minimized by the Bayes decision rule. In practice, the unknown true distribution is usually replaced with a model…
Machine unlearning -- efficiently removing the effect of a small "forget set" of training data on a pre-trained machine learning model -- has recently attracted significant research interest. Despite this interest, however, recent work…
The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications. Until the generalization ability of…
Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this…
This paper studies probabilistic rates of convergence for consensus+innovations type of algorithms in random, generic networks. For each node, we find a lower and also a family of upper bounds on the large deviations rate function, thus…
In a world where Machine Learning (ML) is increasingly deployed to support decision-making in critical domains, providing decision-makers with explainable, stable, and relevant inputs becomes fundamental. Understanding how machine learning…
Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach…