Related papers: Data-Free/Data-Sparse Softmax Parameter Estimation…

Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

A Convex Relaxation for Weakly Supervised Classifiers

This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as…

Machine Learning · Computer Science 2012-07-03 Armand Joulin , Francis Bach

How Free is Parameter-Free Stochastic Optimization?

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods,…

Machine Learning · Computer Science 2024-10-22 Amit Attia , Tomer Koren

Revisiting lp-constrained Softmax Loss: A Comprehensive Study

Normalization is a vital process for any machine learning task as it controls the properties of data and affects model performance at large. The impact of particular forms of normalization, however, has so far been investigated in limited…

Machine Learning · Computer Science 2022-06-22 Chintan Trivedi , Konstantinos Makantasis , Antonios Liapis , Georgios N. Yannakakis

Semantic Softmax Loss for Zero-Shot Learning

A typical pipeline for Zero-Shot Learning (ZSL) is to integrate the visual features and the class semantic descriptors into a multimodal framework with a linear or bilinear model. However, the visual features and the class semantic…

Computer Vision and Pattern Recognition · Computer Science 2017-05-23 Zhong Ji , Yunxin Sun , Yulong Yu , Jichang Guo , Yanwei Pang

A Softmax-free Loss Function Based on Predefined Optimal-distribution of Latent Features for Deep Learning Classifier

In the field of pattern classification, the training of deep learning classifiers is mostly end-to-end learning, and the loss function is the constraint on the final output (posterior probability) of the network, so the existence of Softmax…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Qiuyu Zhu , Xuewen Zu

Towards Unbiased Exploration in Partial Label Learning

We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can…

Machine Learning · Computer Science 2023-07-04 Zsolt Zombori , Agapi Rissaki , Kristóf Szabó , Wolfgang Gatterbauer , Michael Benedikt

Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency…

Machine Learning · Computer Science 2024-05-29 Ha Manh Bui , Anqi Liu

Solver-Free Decision-Focused Learning for Linear Optimization Problems

Mathematical optimization is a fundamental tool for decision-making in a wide range of applications. However, in many real-world scenarios, the parameters of the optimization problem are not known a priori and must be predicted from…

Machine Learning · Computer Science 2025-11-13 Senne Berden , Ali İrfan Mahmutoğulları , Dimos Tsouros , Tias Guns

More Information Supervised Probabilistic Deep Face Embedding Learning

Researches using margin based comparison loss demonstrate the effectiveness of penalizing the distance between face feature and their corresponding class centers. Despite their popularity and excellent performance, they do not explicitly…

Computer Vision and Pattern Recognition · Computer Science 2020-06-12 Ying Huang , Shangfeng Qiu , Wenwei Zhang , Xianghui Luo , Jinzhuo Wang

Unsupervised Deep Metric Learning via Orthogonality based Probabilistic Loss

Metric learning is an important problem in machine learning. It aims to group similar examples together. Existing state-of-the-art metric learning approaches require class labels to learn a metric. As obtaining class labels in all…

Computer Vision and Pattern Recognition · Computer Science 2020-09-29 Ujjal Kr Dutta , Mehrtash Harandi , Chellu Chandra Sekhar

A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law…

Machine Learning · Computer Science 2026-05-22 Marcel Kühn , Yoon Thelge , Bernd Rosenow

Strong convexity-guided hyper-parameter optimization for flatter losses

We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss…

Machine Learning · Computer Science 2024-02-08 Rahul Yedida , Snehanshu Saha

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over…

Machine Learning · Computer Science 2019-05-15 Octavian-Eugen Ganea , Sylvain Gelly , Gary Bécigneul , Aliaksei Severyn

Learning Weakly Convex Sets in Metric Spaces

One of the central problems studied in the theory of machine learning is the question of whether, for a given class of hypotheses, it is possible to efficiently find a {consistent} hypothesis, i.e., which has zero training error. While…

Machine Learning · Computer Science 2024-03-21 Eike Stadtländer , Tamás Horváth , Stefan Wrobel

Relaxed Softmax for learning from Positive and Unlabeled data

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation,…

Machine Learning · Statistics 2019-09-19 Ugo Tanielian , Flavian Vasile

$\epsilon$-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise

Noisy labels pose a common challenge for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions to achieve noise tolerance in the presence of label noise, particularly…

Machine Learning · Computer Science 2025-08-05 Jialiang Wang , Xiong Zhou , Deming Zhai , Junjun Jiang , Xiangyang Ji , Xianming Liu

Safe Feature Elimination in Sparse Supervised Learning

We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of…

Machine Learning · Computer Science 2010-10-28 Laurent El Ghaoui , Vivian Viallon , Tarek Rabbani

Hyperspherical Classification with Dynamic Label-to-Prototype Assignment

Aiming to enhance the utilization of metric space by the parametric softmax classifier, recent studies suggest replacing it with a non-parametric alternative. Although a non-parametric classifier may provide better metric space utilization,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-26 Mohammad Saeed Ebrahimi Saadabadi , Ali Dabouei , Sahar Rahimi Malakshan , Nasser M. Nasrabad

Balanced Meta-Softmax for Long-Tailed Visual Recognition

Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function,…

Machine Learning · Computer Science 2020-11-24 Jiawei Ren , Cunjun Yu , Shunan Sheng , Xiao Ma , Haiyu Zhao , Shuai Yi , Hongsheng Li