Related papers: Adaptive Sampled Softmax with Kernel Based Samplin…

Sampled Softmax with Random Fourier Features

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of…

Machine Learning · Computer Science 2020-01-01 Ankit Singh Rawat , Jiecao Chen , Felix Yu , Ananda Theertha Suresh , Sanjiv Kumar

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications

The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost…

Machine Learning · Computer Science 2025-01-16 Jin Chen , Jin Zhang , Xu huang , Yi Yang , Defu Lian , Enhong Chen

Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

Efficient Parameter Optimisation for Quantum Kernel Alignment: A Sub-sampling Approach in Variational Training

Quantum machine learning with quantum kernels for classification problems is a growing area of research. Recently, quantum kernel alignment techniques that parameterise the kernel have been developed, allowing the kernel to be trained and…

Quantum Physics · Physics 2024-10-23 M. Emre Sahin , Benjamin C. B. Symons , Pushpak Pati , Fayyaz Minhas , Declan Millar , Maria Gabrani , Stefano Mensa , Jan Lukas Robertus

Self-Normalized Importance Sampling for Neural Language Modeling

To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based…

Computation and Language · Computer Science 2022-06-20 Zijian Yang , Yingbo Gao , Alexander Gerstenberger , Jintao Jiang , Ralf Schlüter , Hermann Ney

Fast Adaptive Beamforming based on kernel method under Small Sample Support

It is well-known that the high computational complexity and the insufficient samples in large-scale array signal processing restrict the real-world applications of the conventional full-dimensional adaptive beamforming (sample matrix…

Information Theory · Computer Science 2014-05-20 Hu Xie , Da-Zheng Feng , Ming-Dong Yuan

Random Machines: A bagged-weighted support vector model with free kernel choice

Improvement of statistical learning models in order to increase efficiency in solving classification or regression problems is still a goal pursued by the scientific community. In this way, the support vector machine model is one of the…

Machine Learning · Statistics 2019-11-22 Anderson Ara , Mateus Maia , Samuel Macêdo , Francisco Louzada

Optimal Sampling for Kernel Quadrature on Unbounded Domains

Kernel quadrature is widely used to approximate integrals of smooth functions, with worst-case error typically decaying at the minimax rate $n^{-\alpha/d}$ for smoothness $\alpha$ in dimension $d$. Existing rate-optimal methods often depend…

Computation · Statistics 2026-05-19 Edoardo Bandoni , Christian Robert , Julien Stoehr

Active Sampling for Min-Max Fairness

We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. The key intuition behind our approach is to use at…

Machine Learning · Statistics 2022-06-20 Jacob Abernethy , Pranjal Awasthi , Matthäus Kleindessner , Jamie Morgenstern , Chris Russell , Jie Zhang

Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling

The Softmax bottleneck was first identified in language modeling as a theoretical limit on the expressivity of Softmax-based models. Being one of the most widely-used methods to output probability, Softmax-based models have found a wide…

Machine Learning · Computer Science 2021-10-12 Ying-Chen Lin

Image Score: How to Select Useful Samples

There has long been debates on how we could interpret neural networks and understand the decisions our models make. Specifically, why deep neural networks tend to be error-prone when dealing with samples that output low softmax scores. We…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Simiao Zuo , Jialin Wu

Attention Scheme Inspired Softmax Regression

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

A Tale of Two Efficient and Informative Negative Sampling Distributions

Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full softmax is costly from the computational and energy…

Machine Learning · Computer Science 2021-07-30 Shabnam Daghaghi , Tharun Medini , Nicholas Meisburger , Beidi Chen , Mengnan Zhao , Anshumali Shrivastava

Adaptive Sampling for Minimax Fair Classification

Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups. To address this issue, we consider the problem of adaptively constructing training sets which allow us…

Machine Learning · Computer Science 2021-07-21 Shubhanshu Shekhar , Greg Fields , Mohammad Ghavamzadeh , Tara Javidi

Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency…

Machine Learning · Computer Science 2024-05-29 Ha Manh Bui , Anqi Liu

Improving Sample Efficiency with Normalized RBF Kernels

In deep learning models, learning more with less data is becoming more important. This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency. Moreover, we…

Machine Learning · Computer Science 2020-08-03 Sebastian Pineda-Arango , David Obando-Paniagua , Alperen Dedeoglu , Philip Kurzendörfer , Friedemann Schestag , Randolf Scholz

Softmax Optimizations for Intel Xeon Processor-based Platforms

Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology…

Mathematical Software · Computer Science 2019-05-28 Jacek Czaja , Michal Gallus , Tomasz Patejko , Jian Tang

Kernelized Classification in Deep Networks

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent…

Computation and Language · Computer Science 2019-03-25 Sachin Kumar , Yulia Tsvetkov

DropMax: Adaptive Variational Softmax

We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over…

Machine Learning · Computer Science 2018-11-05 Hae Beom Lee , Juho Lee , Saehoon Kim , Eunho Yang , Sung Ju Hwang