English
Related papers

Related papers: Efficient Sampled Softmax for Tensorflow

200 papers

Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology…

Mathematical Software · Computer Science 2019-05-28 Jacek Czaja , Michal Gallus , Tomasz Patejko , Jian Tang

Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to…

Numerical Analysis · Mathematics 2019-09-10 Pierre Blanchard , Desmond J. Higham , Nicholas J. Higham

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being…

Computation and Language · Computer Science 2022-05-20 Maxat Tezekbayev , Vassilina Nikoulina , Matthias Gallé , Zhenisbek Assylbekov

Learning image representations on decentralized data can bring many benefits in cases where data cannot be aggregated across data silos. Softmax cross entropy loss is highly effective and commonly used for learning image representations.…

Machine Learning · Computer Science 2022-03-10 Sagar M. Waghmare , Hang Qi , Huizhong Chen , Mikhail Sirotenko , Tomer Meron

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of…

Machine Learning · Computer Science 2020-01-01 Ankit Singh Rawat , Jiecao Chen , Felix Yu , Ananda Theertha Suresh , Sanjiv Kumar

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a…

Hardware Architecture · Computer Science 2021-03-18 Jacob R. Stevens , Rangharajan Venkatesan , Steve Dai , Brucek Khailany , Anand Raghunathan

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

RNN-Transducer has been one of promising architectures for end-to-end automatic speech recognition. Although RNN-Transducer has many advantages including its strong accuracy and streaming-friendly property, its high memory consumption…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-01 Jaesong Lee , Lukas Lee , Shinji Watanabe

Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full softmax is costly from the computational and energy…

Machine Learning · Computer Science 2021-07-30 Shabnam Daghaghi , Tharun Medini , Nicholas Meisburger , Beidi Chen , Mengnan Zhao , Anshumali Shrivastava

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose…

Machine Learning · Computer Science 2025-09-30 Ziheng Cheng , Zhong Li , Jiang Bian

The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction…

Performance · Computer Science 2018-07-31 Maxim Milakov , Natalia Gimelshein

Predicting the next activity of a running process is an important aspect of process management. Recently, artificial neural networks, so called deep-learning approaches, have been proposed to address this challenge. This demo paper…

Machine Learning · Computer Science 2017-05-04 Joerg Evermann , Jana-Rebecca Rehse , Peter Fettke

Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures…

Hardware Architecture · Computer Science 2021-02-12 Simon Pfenning , Philipp Holzinger , Marc Reichenbach

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-03 Steven W. D. Chien , Stefano Markidis , Vyacheslav Olshevsky , Yaroslav Bulatov , Erwin Laure , Jeffrey S. Vetter

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the…

Computation and Language · Computer Science 2017-06-20 Edouard Grave , Armand Joulin , Moustapha Cissé , David Grangier , Hervé Jégou

The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost…

Machine Learning · Computer Science 2025-01-16 Jin Chen , Jin Zhang , Xu huang , Yi Yang , Defu Lian , Enhong Chen

Swift for TensorFlow is a deep learning platform that scales from mobile devices to clusters of hardware accelerators in data centers. It combines a language-integrated automatic differentiation system and multiple Tensor implementations…

Tensor network methods are a conceptually elegant framework for encoding complicated datasets, where high-order tensors are approximated as networks of low-order tensors. In practice, however, the numeric implementation of tensor network…

Quantum Physics · Physics 2019-11-07 Glen Evenbly

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

‹ Prev 1 2 3 10 Next ›