English
Related papers

Related papers: Data organization limits the predictability of bin…

200 papers

The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in…

Statistics Theory · Mathematics 2022-09-27 Shashank Singh , Justin Khim

Despite the widespread use of machine learning algorithms to solve problems of technological, economic, and social relevance, provable guarantees on the performance of these data-driven algorithms are critically lacking, especially when the…

Machine Learning · Computer Science 2019-03-18 Abed AlRahman Al Makdah , Vaibhav Katewa , Fabio Pasqualetti

Data classification, the process of analyzing data and organizing it into categories, is a fundamental computing problem of natural and artificial information processing systems. Ideally, the performance of classifier models would be…

Machine Learning · Computer Science 2022-06-07 Claus Metzner , Achim Schilling , Maximilian Traxdorf , Konstantin Tziridis , Holger Schulze , Patrick Krauss

Recent research demonstrated that training large language models involves memorization of a significant fraction of training data. Such memorization can lead to privacy violations when training on sensitive user data and thus motivates the…

Machine Learning · Computer Science 2025-10-29 Vitaly Feldman , Guy Kornowski , Xin Lyu

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset.…

Machine Learning · Computer Science 2020-11-10 Daniel Bashir , George D. Montanez , Sonia Sehra , Pedro Sandoval Segura , Julius Lauw

In most machine learning applications, classification accuracy is not the primary metric of interest. Binary classifiers which face class imbalance are often evaluated by the $F_\beta$ score, area under the precision-recall curve, Precision…

Machine Learning · Computer Science 2018-03-02 Alan Mackey , Xiyang Luo , Elad Eban

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident. It is commonly believed that such over-confidence is mainly due…

Machine Learning · Computer Science 2021-07-21 Yu Bai , Song Mei , Huan Wang , Caiming Xiong

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In the typical setting investigated till now, each classifier is trained…

Machine Learning · Computer Science 2007-05-23 Kagan Tumer , Joydeep Ghosh

Excessive reuse of holdout data can lead to overfitting. However, there is little concrete evidence of significant overfitting due to holdout reuse in popular multiclass benchmarks today. Known results show that, in the worst-case,…

Machine Learning · Computer Science 2019-05-27 Vitaly Feldman , Roy Frostig , Moritz Hardt

Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound).…

Machine Learning · Computer Science 2023-11-21 Gundeep Arora , Srujana Merugu , Anoop Saladi , Rajeev Rastogi

Data driven classification that relies on neural networks is based on optimization criteria that involve some form of distance between the output of the network and the desired label. Using the same mathematical analysis, for a multitude of…

Machine Learning · Computer Science 2019-06-25 Kalliopi Basioti , George V. Moustakides

In classification problems, especially those that categorize data into a large number of classes, the classes often naturally follow a hierarchical structure. That is, some classes are likely to share similar structures and features. Those…

Machine Learning · Computer Science 2018-07-25 Denali Molitor , Deanna Needell

Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…

Machine Learning · Computer Science 2020-08-26 Yashesh Dhebar , Sparsh Gupta , Kalyanmoy Deb

Deep neural networks (DNNs) trained with the logistic loss (i.e., the cross entropy loss) have made impressive advancements in various binary classification tasks. However, generalization analysis for binary classification with DNNs and…

Machine Learning · Statistics 2024-04-23 Zihan Zhang , Lei Shi , Ding-Xuan Zhou

Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the…

Machine Learning · Statistics 2021-02-18 Marco J. H. Puts , Piet J. H. Daas

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that…

Machine Learning · Computer Science 2025-03-13 Shoma Yokura , Akihisa Ichiki

We revisit the foundations of fairness and its interplay with utility and efficiency in settings where the training data contain richer labels, such as individual types, rankings, or risk estimates, rather than just binary outcomes. In this…

Machine Learning · Computer Science 2025-05-23 Noga Amit , Omer Reingold , Guy N. Rothblum

The unwavering success of deep learning in the past decade led to the increasing prevalence of deep learning methods in various application fields. However, the downsides of deep learning, most prominently its lack of trustworthiness, may…

Machine Learning · Computer Science 2024-08-13 Holger Boche , Vit Fojtik , Adalbert Fono , Gitta Kutyniok

We consider the problem of learning a binary classifier from $n$ different data sources, among which at most an $\eta$ fraction are adversarial. The overhead is defined as the ratio between the sample complexity of learning in this setting…

Machine Learning · Computer Science 2018-05-15 Mingda Qiao

We provide a novel characterization of semiparametric efficiency in a generic supervised learning setting where the outcome mean function -- defined as the conditional expectation of the outcome of interest given the other observed…

Methodology · Statistics 2025-04-22 Harrison H. Li
‹ Prev 1 2 3 10 Next ›