Related papers: Statistical Hypothesis Testing Based on Machine Le…

Large Deviations for Classification Performance Analysis of Machine Learning Systems

We study the performance of machine learning binary classification techniques in terms of error probabilities. The statistical test is based on the Data-Driven Decision Function (D3F), learned in the training phase, i.e., what is…

Machine Learning · Computer Science 2023-01-19 Paolo Braca , Leonardo M. Millefiori , Augusto Aubry , Antonio De Maio , Peter Willett

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing,…

Machine Learning · Computer Science 2020-11-17 Christos Thrampoulidis , Samet Oymak , Mahdi Soltanolkotabi

Extending the Scope of Inference About Predictive Ability to Machine Learning Methods

The use of machine learning methods for predictive purposes has increased dramatically over the past two decades, but uncertainty quantification for predictive comparisons remains elusive. This paper addresses this gap by extending the…

Econometrics · Economics 2025-05-09 Juan Carlos Escanciano , Ricardo Parra

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the…

Machine Learning · Computer Science 2024-07-10 Ping Hu , Virginia Bordignon , Mert Kayaalp , Ali H. Sayed

The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime

Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by…

Statistics Theory · Mathematics 2023-03-23 Andrea Montanari , Feng Ruan , Youngtak Sohn , Jun Yan

Good Classifiers are Abundant in the Interpolating Regime

Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the…

Machine Learning · Statistics 2021-03-05 Ryan Theisen , Jason M. Klusowski , Michael W. Mahoney

Improved Inference for the Signal Significance

We study the properties of several likelihood-based statistics commonly used in testing for the presence of a known signal under a mixture model with known background, but unknown signal fraction. Under the null hypothesis of no signal, all…

Data Analysis, Statistics and Probability · Physics 2018-12-26 Igor Volobouev , A. Alexandre Trindade

A Bayesian Perspective of Statistical Machine Learning for Big Data

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data…

Machine Learning · Computer Science 2018-11-14 Rajiv Sambasivan , Sourish Das , Sujit K Sahu

On the Optimality of Averaging in Distributed Statistical Learning

A common approach to statistical learning with big-data is to randomly split it among $m$ machines and learn the parameter of interest by averaging the $m$ individual estimates. In this paper, focusing on empirical risk minimization, or…

Machine Learning · Statistics 2016-06-14 Jonathan Rosenblatt , Boaz Nadler

Statistical Classification via Robust Hypothesis Testing: Non-Asymptotic and Simple Bounds

We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust…

Information Theory · Computer Science 2021-10-11 Hüseyin Afşer

A weak convergence approach to large deviations for stochastic approximations

The theory of stochastic approximations form the theoretical foundation for studying convergence properties of many popular recursive learning algorithms in statistics, machine learning and statistical physics. Large deviations for…

Probability · Mathematics 2025-02-05 Henrik Hult , Adam Lindhe , Pierre Nyquist , Guo-Jhen Wu

A modern maximum-likelihood theory for high-dimensional logistic regression

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there…

Statistics Theory · Mathematics 2022-06-08 Pragya Sur , Emmanuel J. Candes

Automatically detecting data drift in machine learning classifiers

Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical…

Machine Learning · Computer Science 2021-11-11 Samuel Ackerman , Orna Raz , Marcel Zalmanovici , Aviad Zlotnick

Classification Error Bound for Low Bayes Error Conditions in Machine Learning

In statistical classification and machine learning, classification error is an important performance measure, which is minimized by the Bayes decision rule. In practice, the unknown true distribution is usually replaced with a model…

Machine Learning · Computer Science 2025-01-28 Zijian Yang , Vahe Eminyan , Ralf Schlüter , Hermann Ney

Attribute-to-Delete: Machine Unlearning via Datamodel Matching

Machine unlearning -- efficiently removing the effect of a small "forget set" of training data on a pre-trained machine learning model -- has recently attracted significant research interest. Despite this interest, however, recent work…

Machine Learning · Computer Science 2024-11-13 Kristian Georgiev , Roy Rinberg , Sung Min Park , Shivam Garg , Andrew Ilyas , Aleksander Madry , Seth Neel

Prediction Confidence from Neighbors

The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications. Until the generalization ability of…

Computer Vision and Pattern Recognition · Computer Science 2020-04-01 Mark Philip Philipsen , Thomas Baltzer Moeslund

Understanding Classifier Mistakes with Generative Models

Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this…

Machine Learning · Computer Science 2020-10-07 Laëtitia Shao , Yang Song , Stefano Ermon

Inaccuracy rates for distributed inference over random networks with applications to social learning

This paper studies probabilistic rates of convergence for consensus+innovations type of algorithms in random, generic networks. For each node, we find a lower and also a family of upper bounds on the large deviations rate function, thus…

Information Theory · Computer Science 2022-08-11 Dragana Bajovic

Preservation of Feature Stability in Machine Learning Under Data Uncertainty for Decision Support in Critical Domains

In a world where Machine Learning (ML) is increasingly deployed to support decision-making in critical domains, providing decision-makers with explainable, stable, and relevant inputs becomes fundamental. Understanding how machine learning…

Machine Learning · Computer Science 2024-08-07 Karol Capała , Paulina Tworek , Jose Sousa

Learning continuous models for continuous physics

Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach…

Machine Learning · Computer Science 2023-11-23 Aditi S. Krishnapriyan , Alejandro F. Queiruga , N. Benjamin Erichson , Michael W. Mahoney