English
Related papers

Related papers: Learning distributed representations with efficien…

200 papers

Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology…

Mathematical Software · Computer Science 2019-05-28 Jacek Czaja , Michal Gallus , Tomasz Patejko , Jian Tang

The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over…

Machine Learning · Computer Science 2019-05-15 Octavian-Eugen Ganea , Sylvain Gelly , Gary Bécigneul , Aliaksei Severyn

Many loss functions in representation learning are invariant under a continuous symmetry transformation. For example, the loss function of word embeddings (Mikolov et al., 2013) remains unchanged if we simultaneously rotate all word and…

Machine Learning · Statistics 2020-07-21 Robert Bamler , Stephan Mandt

Semantic embeddings to represent objects such as image, text and audio are widely used in machine learning and have spurred the development of vector similarity search methods for retrieving semantically related objects. In this work, we…

Data Structures and Algorithms · Computer Science 2026-01-21 Stephen Mussmann , Mehul Smriti Raje , Kavya Tumkur , Oumayma Messoussi , Cyprien Hachem , Seby Jacob

The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation…

Machine Learning · Statistics 2016-11-01 Michalis K. Titsias

Learning image representations on decentralized data can bring many benefits in cases where data cannot be aggregated across data silos. Softmax cross entropy loss is highly effective and commonly used for learning image representations.…

Machine Learning · Computer Science 2022-03-10 Sagar M. Waghmare , Hang Qi , Huizhong Chen , Mikhail Sirotenko , Tomer Meron

Methods for learning word sense embeddings represent a single word with multiple sense-specific vectors. These methods should not only produce interpretable sense embeddings, but should also learn how to select which sense to use in a given…

Computation and Language · Computer Science 2019-12-17 Fenfei Guo , Mohit Iyyer , Jordan Boyd-Graber

Person re-identification is a challenging task because of the high intra-class variance induced by the unrestricted nuisance factors of variations such as pose, illumination, viewpoint, background, and sensor noise. Recent approaches…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Sinan Sabri , Zaigham Randhawa , Gianfranco Doretto

Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel…

Machine Learning · Computer Science 2015-06-25 Jiatao Gu , Victor O. K. Li

In this paper, we present a maximum likelihood estimation approach to determine the value vector in transformer models. We model the sequence of value vectors, key vectors, and the query vector as a sequence of Gaussian distributions. The…

Machine Learning · Computer Science 2025-09-17 Jiyong Ma

Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different…

Computer Vision and Pattern Recognition · Computer Science 2019-03-12 Wenqi Shao , Tianjian Meng , Jingyu Li , Ruimao Zhang , Yudian Li , Xiaogang Wang , Ping Luo

Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Shulun Wang , Bin Liu , Feng Liu

Normalization is a vital process for any machine learning task as it controls the properties of data and affects model performance at large. The impact of particular forms of normalization, however, has so far been investigated in limited…

Machine Learning · Computer Science 2022-06-22 Chintan Trivedi , Konstantinos Makantasis , Antonios Liapis , Georgios N. Yannakakis

Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of…

Computer Vision and Pattern Recognition · Computer Science 2020-04-16 Qi Qian , Lei Shang , Baigui Sun , Juhua Hu , Hao Li , Rong Jin

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and…

Machine Learning · Computer Science 2015-02-24 Yichuan Tang

Word embedding, which encodes words into vectors, is an important starting point in natural language processing and commonly used in many text-based machine learning tasks. However, in most current word embedding approaches, the similarity…

Computation and Language · Computer Science 2018-12-27 Denis Sedov , Zhirong Yang

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition. However, the intra- and inter-class objectives in the softmax loss are entangled, therefore a…

Computer Vision and Pattern Recognition · Computer Science 2020-02-13 Lanqing He , Zhongdao Wang , Yali Li , Shengjin Wang

Many applications of generative models rely on the marginalization of their high-dimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more…

Machine Learning · Computer Science 2021-10-28 Phil Chen , Masha Itkina , Ransalu Senanayake , Mykel J. Kochenderfer

A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -- feature embeddings -- which in turn regularize the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-05 Ahmed Taha , Alex Hanson , Abhinav Shrivastava , Larry Davis

Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to…

Machine Learning · Computer Science 2025-04-10 Golara Ahmadi Azar , Melika Emami , Alyson Fletcher , Sundeep Rangan
‹ Prev 1 2 3 10 Next ›