English
Related papers

Related papers: Adaptive Sampled Softmax with Kernel Based Samplin…

200 papers

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of…

Machine Learning · Computer Science 2020-01-01 Ankit Singh Rawat , Jiecao Chen , Felix Yu , Ananda Theertha Suresh , Sanjiv Kumar

The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost…

Machine Learning · Computer Science 2025-01-16 Jin Chen , Jin Zhang , Xu huang , Yi Yang , Defu Lian , Enhong Chen

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

Quantum machine learning with quantum kernels for classification problems is a growing area of research. Recently, quantum kernel alignment techniques that parameterise the kernel have been developed, allowing the kernel to be trained and…

To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based…

Computation and Language · Computer Science 2022-06-20 Zijian Yang , Yingbo Gao , Alexander Gerstenberger , Jintao Jiang , Ralf Schlüter , Hermann Ney

It is well-known that the high computational complexity and the insufficient samples in large-scale array signal processing restrict the real-world applications of the conventional full-dimensional adaptive beamforming (sample matrix…

Information Theory · Computer Science 2014-05-20 Hu Xie , Da-Zheng Feng , Ming-Dong Yuan

Improvement of statistical learning models in order to increase efficiency in solving classification or regression problems is still a goal pursued by the scientific community. In this way, the support vector machine model is one of the…

Machine Learning · Statistics 2019-11-22 Anderson Ara , Mateus Maia , Samuel Macêdo , Francisco Louzada

Kernel quadrature is widely used to approximate integrals of smooth functions, with worst-case error typically decaying at the minimax rate $n^{-\alpha/d}$ for smoothness $\alpha$ in dimension $d$. Existing rate-optimal methods often depend…

Computation · Statistics 2026-05-19 Edoardo Bandoni , Christian Robert , Julien Stoehr

We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. The key intuition behind our approach is to use at…

The Softmax bottleneck was first identified in language modeling as a theoretical limit on the expressivity of Softmax-based models. Being one of the most widely-used methods to output probability, Softmax-based models have found a wide…

Machine Learning · Computer Science 2021-10-12 Ying-Chen Lin

There has long been debates on how we could interpret neural networks and understand the decisions our models make. Specifically, why deep neural networks tend to be error-prone when dealing with samples that output low softmax scores. We…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Simiao Zuo , Jialin Wu

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full softmax is costly from the computational and energy…

Machine Learning · Computer Science 2021-07-30 Shabnam Daghaghi , Tharun Medini , Nicholas Meisburger , Beidi Chen , Mengnan Zhao , Anshumali Shrivastava

Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups. To address this issue, we consider the problem of adaptively constructing training sets which allow us…

Machine Learning · Computer Science 2021-07-21 Shubhanshu Shekhar , Greg Fields , Mohammad Ghavamzadeh , Tara Javidi

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency…

Machine Learning · Computer Science 2024-05-29 Ha Manh Bui , Anqi Liu

In deep learning models, learning more with less data is becoming more important. This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency. Moreover, we…

Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology…

Mathematical Software · Computer Science 2019-05-28 Jacek Czaja , Michal Gallus , Tomasz Patejko , Jian Tang

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent…

Computation and Language · Computer Science 2019-03-25 Sachin Kumar , Yulia Tsvetkov

We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over…

Machine Learning · Computer Science 2018-11-05 Hae Beom Lee , Juho Lee , Saehoon Kim , Eunho Yang , Sung Ju Hwang
‹ Prev 1 2 3 10 Next ›