Related papers: Exploring Alternatives to Softmax Function

An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family

In a multi-class classification problem, it is standard to model the output of a neural network as a categorical distribution conditioned on the inputs. The output must therefore be positive and sum to one, which is traditionally enforced…

Neural and Evolutionary Computing · Computer Science 2016-03-01 Alexandre de Brébisson , Pascal Vincent

Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family

Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size…

Machine Learning · Computer Science 2016-05-30 Alexandre de Brébisson , Pascal Vincent

A Quantitative Evaluation of Approximate Softmax Functions for Deep Neural Networks

The softmax function is a widely used activation function in the output layers of neural networks, responsible for converting raw scores into class probabilities while introducing essential non-linearity. Implementing Softmax efficiently…

Hardware Architecture · Computer Science 2026-04-09 Anthony Leiva-Valverde , Fabricio Elizondo-Fernández , Luis G. León-Vega , Cristina Meinhardt , Jorge Castro-Godínez

Additive Margin Softmax for Face Verification

In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric…

Computer Vision and Pattern Recognition · Computer Science 2018-05-31 Feng Wang , Weiyang Liu , Haijun Liu , Jian Cheng

Ensemble Soft-Margin Softmax Loss for Image Classification

Softmax loss is arguably one of the most popular losses to train CNN models for image classification. However, recent works have exposed its limitation on feature discriminability. This paper casts a new viewpoint on the weakness of softmax…

Computer Vision and Pattern Recognition · Computer Science 2018-05-11 Xiaobo Wang , Shifeng Zhang , Zhen Lei , Si Liu , Xiaojie Guo , Stan Z. Li

Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss

The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks. To elucidate its theoretical properties, the Fenchel-Young framework situates it as a canonical instance within a broad family of…

Machine Learning · Computer Science 2026-02-02 Yuanhao Pu , Defu Lian , Enhong Chen

Relaxed Softmax for learning from Positive and Unlabeled data

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation,…

Machine Learning · Statistics 2019-09-19 Ugo Tanielian , Flavian Vasile

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network…

Computation and Language · Computer Science 2016-02-09 André F. T. Martins , Ramón Fernandez Astudillo

Loss Function Search for Face Recognition

In face recognition, designing margin-based (e.g., angular, additive, additive angular margins) softmax loss functions plays an important role in learning discriminative features. However, these hand-crafted heuristic methods are…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Xiaobo Wang , Shuo Wang , Cheng Chi , Shifeng Zhang , Tao Mei

Additive Phoneme-aware Margin Softmax Loss for Language Recognition

This paper proposes an additive phoneme-aware margin softmax (APM-Softmax) loss to train the multi-task learning network with phonetic information for language recognition. In additive margin softmax (AM-Softmax) loss, the margin is set as…

Sound · Computer Science 2021-06-25 Zheng Li , Yan Liu , Lin Li , Qingyang Hong

Loss Functions for Top-k Error: Analysis and Insights

In order to push the performance on realistic computer vision tasks, the number of classes in modern benchmark datasets has significantly increased in recent years. This increase in the number of classes comes along with increased ambiguity…

Machine Learning · Statistics 2016-04-14 Maksim Lapin , Matthias Hein , Bernt Schiele

Revisiting lp-constrained Softmax Loss: A Comprehensive Study

Normalization is a vital process for any machine learning task as it controls the properties of data and affects model performance at large. The impact of particular forms of normalization, however, has so far been investigated in limited…

Machine Learning · Computer Science 2022-06-22 Chintan Trivedi , Konstantinos Makantasis , Antonios Liapis , Georgios N. Yannakakis

Effectiveness of MPC-friendly Softmax Replacement

Softmax is widely used in deep learning to map some representation to a probability distribution. As it is based on exp/log functions that are relatively expensive in multi-party computation, Mohassel and Zhang (2017) proposed a simpler…

Machine Learning · Computer Science 2021-07-07 Marcel Keller , Ke Sun

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce…

Machine Learning · Computer Science 2025-03-13 Julius Gonsior , Christian Falkenberg , Silvio Magino , Anja Reusch , Maik Thiele , Wolfgang Lehner

PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation

Softmax Loss (SL) is widely applied in recommender systems (RS) and has demonstrated effectiveness. This work analyzes SL from a pairwise perspective, revealing two significant limitations: 1) the relationship between SL and conventional…

Machine Learning · Computer Science 2025-08-05 Weiqin Yang , Jiawei Chen , Xin Xin , Sheng Zhou , Binbin Hu , Yan Feng , Chun Chen , Can Wang

Mis-classified Vector Guided Softmax Loss for Face Recognition

Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (\textit{e.g.},…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Xiaobo Wang , Shifeng Zhang , Shuo Wang , Tianyu Fu , Hailin Shi , Tao Mei

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Top-k error is currently a popular performance measure on large scale image classification benchmarks such as ImageNet and Places. Despite its wide acceptance, our understanding of this metric is limited as most of the previous research is…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Maksim Lapin , Matthias Hein , Bernt Schiele

Large-Margin Softmax Loss for Convolutional Neural Networks

Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly…

Machine Learning · Statistics 2017-11-21 Weiyang Liu , Yandong Wen , Zhiding Yu , Meng Yang

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning. However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural…

Machine Learning · Statistics 2018-05-29 Sekitoshi Kanai , Yasuhiro Fujiwara , Yuki Yamanaka , Shuichi Adachi