Related papers: Efficient Learning for Undirected Topic Models

Relaxed Softmax for learning from Positive and Unlabeled data

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation,…

Machine Learning · Statistics 2019-09-19 Ugo Tanielian , Flavian Vasile

Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

This paper studies the unsupervised embedding learning problem, which requires an effective similarity measurement between samples in low-dimensional embedding space. Motivated by the positive concentrated and negative separated properties…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Mang Ye , Xu Zhang , Pong C. Yuen , Shih-Fu Chang

Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency…

Machine Learning · Computer Science 2024-05-29 Ha Manh Bui , Anqi Liu

Robust Document Representations using Latent Topics and Metadata

Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems. This technique is not adequate when labeled examples…

Computation and Language · Computer Science 2020-10-27 Natraj Raman , Armineh Nourbakhsh , Sameena Shah , Manuela Veloso

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications

The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost…

Machine Learning · Computer Science 2025-01-16 Jin Chen , Jin Zhang , Xu huang , Yi Yang , Defu Lian , Enhong Chen

Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time

Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated…

Computation and Language · Computer Science 2018-07-10 Pankaj Gupta , Subburam Rajaram , Hinrich Schütze , Bernt Andrassy

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the…

Computation and Language · Computer Science 2017-06-20 Edouard Grave , Armand Joulin , Moustapha Cissé , David Grangier , Hervé Jégou

Extreme Classification via Adversarial Softmax Approximation

Training a classifier over a large number of classes, known as 'extreme classification', has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost…

Machine Learning · Statistics 2020-02-18 Robert Bamler , Stephan Mandt

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Zihua Zhao , Feng Hong , Mengxi Chen , Pengyi Chen , Benyuan Liu , Jiangchao Yao , Ya Zhang , Yanfeng Wang

Finite Sample Bounds for Non-Parametric Regression: Optimal Sample Efficiency and Space Complexity

We address the problem of learning an unknown smooth function and its derivatives from noisy pointwise evaluations under the supremum norm. While classical nonparametric regression provides a strong theoretical foundation, traditional…

Machine Learning · Computer Science 2026-03-10 Davide Maran , Marcello Restelli

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated…

Machine Learning · Computer Science 2022-05-04 Vincent Mai , Kaustubh Mani , Liam Paull

On the Effectiveness of Sampled Softmax Loss for Item Recommendation

The learning objective plays a fundamental role to build a recommender system. Most methods routinely adopt either pointwise or pairwise loss to train the model parameters, while rarely pay attention to softmax loss due to its computational…

Information Retrieval · Computer Science 2023-12-20 Jiancan Wu , Xiang Wang , Xingyu Gao , Jiawei Chen , Hongcheng Fu , Tianyu Qiu

Learning Confidence for Transformer-based Neural Machine Translation

Confidence estimation aims to quantify the confidence of the model prediction, providing an expectation of success. A well-calibrated confidence estimate enables accurate failure prediction and proper risk measurement when given noisy…

Computation and Language · Computer Science 2022-03-23 Yu Lu , Jiali Zeng , Jiajun Zhang , Shuangzhi Wu , Mu Li

Momentum Contrastive Learning with Enhanced Negative Sampling and Hard Negative Filtering

Contrastive learning has become pivotal in unsupervised representation learning, with frameworks like Momentum Contrast (MoCo) effectively utilizing large negative sample sets to extract discriminative features. However, traditional…

Machine Learning · Computer Science 2025-01-29 Duy Hoang , Huy Ngo , Khoi Pham , Tri Nguyen , Gia Bao , Huy Phan

Scalable-Softmax Is Superior for Attention

The maximum element of the vector output by the Softmax function approaches zero as the input vector size increases. Transformer-based language models rely on Softmax to compute attention scores, causing the attention distribution to…

Computation and Language · Computer Science 2025-02-03 Ken M. Nakanishi

Attention as a Perspective for Learning Tempo-invariant Audio Queries

Current models for audio--sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more…

Sound · Computer Science 2018-09-18 Matthias Dorfer , Jan Hajič , Gerhard Widmer

Skim-Aware Contrastive Learning for Efficient Document Representation

Although transformer-based models have shown strong performance in word- and sentence-level tasks, effectively representing long documents, especially in fields like law and medicine, remains difficult. Sparse attention mechanisms can…

Computation and Language · Computer Science 2026-01-01 Waheed Ahmed Abro , Zied Bouraoui

Learning Video Representations using Contrastive Bidirectional Transformer

This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our…

Machine Learning · Computer Science 2019-10-01 Chen Sun , Fabien Baradel , Kevin Murphy , Cordelia Schmid

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds…

Machine Learning · Statistics 2019-09-06 Xin Bing , Florentina Bunea , Marten Wegkamp

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

In this paper, we focus on the problem of unsupervised image-sentence matching. Existing research explores to utilize document-level structural information to sample positive and negative instances for model training. Although the approach…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Zejun Li , Zhongyu Wei , Zhihao Fan , Haijun Shan , Xuanjing Huang