Related papers: A two-head loss function for deep Average-K classi…

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Top-k error is currently a popular performance measure on large scale image classification benchmarks such as ImageNet and Places. Despite its wide acceptance, our understanding of this metric is limited as most of the previous research is…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Maksim Lapin , Matthias Hein , Bernt Schiele

Loss Functions for Top-k Error: Analysis and Insights

In order to push the performance on realistic computer vision tasks, the number of classes in modern benchmark datasets has significantly increased in recent years. This increase in the number of classes comes along with increased ambiguity…

Machine Learning · Statistics 2016-04-14 Maksim Lapin , Matthias Hein , Bernt Schiele

Smooth Loss Functions for Deep Top-k Classification

The top-k error is a common measure of performance in machine learning and computer vision. In practice, top-k classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed…

Machine Learning · Computer Science 2018-02-22 Leonard Berrada , Andrew Zisserman , M. Pawan Kumar

Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Kirill Prokofiev , Vladislav Sovrasov

Semi-supervised Learning using Robust Loss

The amount of manually labeled data is limited in medical applications, so semi-supervised learning and automatic labeling strategies can be an asset for training deep neural networks. However, the quality of the automatically generated…

Machine Learning · Computer Science 2022-03-04 Wenhui Cui , Haleh Akrami , Anand A. Joshi , Richard M. Leahy

Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

In modern classification tasks, the number of labels is getting larger and larger, as is the size of the datasets encountered in practice. As the number of classes increases, class ambiguity and class imbalance become more and more…

Machine Learning · Statistics 2022-07-19 Camille Garcin , Maximilien Servajean , Alexis Joly , Joseph Salmon

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a…

Computation and Language · Computer Science 2019-09-11 Jiawei Wu , Wenhan Xiong , William Yang Wang

Classification Under Ambiguity: When Is Average-K Better Than Top-K?

When many labels are possible, choosing a single one can lead to low precision. A common alternative, referred to as top-$K$ classification, is to choose some number $K$ (commonly around 5) and to return the $K$ labels with the highest…

Machine Learning · Statistics 2021-12-17 Titouan Lorieul , Alexis Joly , Dennis Shasha

Exploring Alternatives to Softmax Function

Softmax function is widely used in artificial neural networks for multiclass classification, multilabel classification, attention mechanisms, etc. However, its efficacy is often questioned in literature. The log-softmax loss has been shown…

Machine Learning · Computer Science 2020-11-24 Kunal Banerjee , Vishak Prasad C , Rishi Raj Gupta , Karthik Vyas , Anushree H , Biswajit Mishra

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network…

Computation and Language · Computer Science 2016-02-09 André F. T. Martins , Ramón Fernandez Astudillo

Balanced Meta-Softmax for Long-Tailed Visual Recognition

Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function,…

Machine Learning · Computer Science 2020-11-24 Jiawei Ren , Cunjun Yu , Shunan Sheng , Xiao Ma , Haiyu Zhao , Shuai Yi , Hongsheng Li

Dual-Encoders for Extreme Multi-Label Classification

Dual-encoder (DE) models are widely used in retrieval tasks, most commonly studied on open QA benchmarks that are often characterized by multi-class and limited training data. In contrast, their performance in multi-label and data-rich…

Machine Learning · Computer Science 2024-03-19 Nilesh Gupta , Devvrit Khatri , Ankit S Rawat , Srinadh Bhojanapalli , Prateek Jain , Inderjit Dhillon

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions. Compared to conventional single-label classification problem, multi-label recognition…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Tong Wu , Qingqiu Huang , Ziwei Liu , Yu Wang , Dahua Lin

A Layer Separation Optimization Framework for Cross-Entropy Training in Deep Learning

This paper investigates the deep learning optimization problem with softmax cross-entropy loss. We propose a layer separation strategy to alleviate the strong nonconvexity encountered during training deep networks. For cross-entropy models…

Machine Learning · Computer Science 2026-04-28 Yaru Liu , Michael K. Ng , Yiqi Gu

Stabilizing and Improving Federated Learning with Non-IID Data and Client Dropout

The label distribution skew induced data heterogeniety has been shown to be a significant obstacle that limits the model performance in federated learning, which is particularly developed for collaborative model training over decentralized…

Machine Learning · Computer Science 2023-03-16 Jian Xu , Meiling Yang , Wenbo Ding , Shao-Lun Huang

Harnessing Superclasses for Learning from Hierarchical Databases

In many large-scale classification problems, classes are organized in a known hierarchy, typically represented as a tree expressing the inclusion of classes in superclasses. We introduce a loss for this type of supervised hierarchical…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Nicolas Urbani , Sylvain Rousseau , Yves Grandvalet , Leonardo Tanzi

Quantum Probabilistic Label Refining: Enhancing Label Quality for Robust Image Classification

Learning with softmax cross-entropy on one-hot labels often leads to overconfident predictions and poor robustness under noise or perturbations. Label smoothing mitigates this by redistributing some confidence uniformly, but treats all…

Quantum Physics · Physics 2025-10-02 Fang Qi , Lu Peng , Zhengming Ding

Neural Network Classifier as Mutual Information Evaluator

Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset…

Machine Learning · Computer Science 2021-08-17 Zhenyue Qin , Dongwoo Kim , Tom Gedeon

Weakly Supervised Multi-task Learning for Concept-based Explainability

In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations…

Machine Learning · Computer Science 2021-04-27 Catarina Belém , Vladimir Balayan , Pedro Saleiro , Pedro Bizarro

Understanding Gender and Racial Disparities in Image Recognition Models

Large scale image classification models trained on top of popular datasets such as Imagenet have shown to have a distributional skew which leads to disparities in prediction accuracies across different subsections of population…

Computer Vision and Pattern Recognition · Computer Science 2021-07-21 Rohan Mahadev , Anindya Chakravarti