English
Related papers

Related papers: Softmax Optimizations for Intel Xeon Processor-bas…

200 papers

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a…

Hardware Architecture · Computer Science 2021-03-18 Jacob R. Stevens , Rangharajan Venkatesan , Steve Dai , Brucek Khailany , Anand Raghunathan

The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction…

Performance · Computer Science 2018-07-31 Maxim Milakov , Natalia Gimelshein

The softmax function is a widely used activation function in the output layers of neural networks, responsible for converting raw scores into class probabilities while introducing essential non-linearity. Implementing Softmax efficiently…

Learning distributed representations, or embeddings, that encode the relational similarity patterns among objects is a relevant task in machine learning. A popular method to learn the embedding matrices $X, Y$ is optimizing a loss function…

Machine Learning · Computer Science 2025-06-03 Lorenzo Dall'Amico , Enrico Maria Belliardo

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being…

Computation and Language · Computer Science 2022-05-20 Maxat Tezekbayev , Vassilina Nikoulina , Matthias Gallé , Zhenisbek Assylbekov

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

Softmax is widely used in deep learning to map some representation to a probability distribution. As it is based on exp/log functions that are relatively expensive in multi-party computation, Mohassel and Zhang (2017) proposed a simpler…

Machine Learning · Computer Science 2021-07-07 Marcel Keller , Ke Sun

There has been a rapid advance of custom hardware (HW) for accelerating the inference speed of deep neural networks (DNNs). Previously, the softmax layer was not a main concern of DNN accelerating HW, because its portion is relatively small…

Machine Learning · Computer Science 2021-11-23 Ihor Vasyltsov , Wooseok Chang

Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it…

Machine Learning · Computer Science 2018-08-03 Guy Blanc , Steffen Rendle

ASR systems are deployed across diverse environments, each with specific hardware constraints. We use supernet training to jointly train multiple encoders of varying sizes, enabling dynamic model size adjustment to fit hardware constraints…

Computation and Language · Computer Science 2025-02-05 Jingjing Xu , Eugen Beck , Zijian Yang , Ralf Schlüter

The softmax function is a fundamental component in deep learning. This study delves into the often-overlooked parameter within the softmax function, known as "temperature," providing novel insights into the practical and theoretical aspects…

Machine Learning · Computer Science 2025-03-03 Hao Xuan , Bokai Yang , Xingyu Li

Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different…

Computer Vision and Pattern Recognition · Computer Science 2019-03-12 Wenqi Shao , Tianjian Meng , Jingyu Li , Ruimao Zhang , Yudian Li , Xiaogang Wang , Ping Luo

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition. However, the intra- and inter-class objectives in the softmax loss are entangled, therefore a…

Computer Vision and Pattern Recognition · Computer Science 2020-02-13 Lanqing He , Zhongdao Wang , Yali Li , Shengjin Wang

The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that…

Machine Learning · Computer Science 2026-01-27 Yang Cao , Yingyu Liang , Zhenmei Shi , Zhao Song

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to…

Artificial Intelligence · Computer Science 2017-06-15 Kavosh Asadi , Michael L. Littman

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the…

Computation and Language · Computer Science 2017-06-20 Edouard Grave , Armand Joulin , Moustapha Cissé , David Grangier , Hervé Jégou

Faster inference of deep learning models is highly demanded on edge devices and even servers, for both financial and environmental reasons. To address this issue, we propose SoftNeuro, a novel, high-performance inference framework with…

Machine Learning · Computer Science 2021-10-13 Masaki Hilaga , Yasuhiro Kuroda , Hitoshi Matsuo , Tatsuya Kawaguchi , Gabriel Ogawa , Hiroshi Miyake , Yusuke Kozawa

The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in…

Performance · Computer Science 2020-01-14 Marat Dukhan , Artsiom Ablavatski

This short paper discusses an efficient implementation of \emph{sampled softmax loss} for Tensorflow. The speedup over the default implementation is achieved due to simplification of the graph for the forward and backward passes.

Machine Learning · Computer Science 2020-04-14 Maciej Skorski

The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Firas Khader , Omar S. M. El Nahhas , Tianyu Han , Gustav Müller-Franzes , Sven Nebelung , Jakob Nikolas Kather , Daniel Truhn
‹ Prev 1 2 3 10 Next ›