Related papers: Bayesian Attention Modules

Bayesian Attention Belief Networks

Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated…

Machine Learning · Computer Science 2021-06-10 Shujian Zhang , Xinjie Fan , Bo Chen , Mingyuan Zhou

A stochastic model of human visual attention with a dynamic Bayesian network

Recent studies in the field of human vision science suggest that the human responses to the stimuli on a visual display are non-deterministic. People may attend to different locations on the same visual input at the same time. Based on this…

Computer Vision and Pattern Recognition · Computer Science 2015-03-13 Akisato kimura , Derek Pang , Tatsuto Takeuchi , Kouji Miyazato , Junji Yamato , Kunio Kashino

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic…

Machine Learning · Computer Science 2026-05-12 Akash Yadav , Taiwo A. Adebiyi , Ruda Zhang

Learning Wake-Sleep Recurrent Attention Models

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations. Stochastic attention-based models have been shown to improve computational efficiency at test time, but they…

Machine Learning · Computer Science 2015-09-24 Jimmy Ba , Roger Grosse , Ruslan Salakhutdinov , Brendan Frey

Latent Alignment and Variational Attention

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does…

Machine Learning · Statistics 2018-11-09 Yuntian Deng , Yoon Kim , Justin Chiu , Demi Guo , Alexander M. Rush

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

The neural attention mechanism plays an important role in many natural language processing applications. In particular, the use of multi-head attention extends single-head attention by allowing a model to jointly attend information from…

Machine Learning · Computer Science 2020-11-03 Bang An , Jie Lyu , Zhenyi Wang , Chunyuan Li , Changwei Hu , Fei Tan , Ruiyi Zhang , Yifan Hu , Changyou Chen

Attention: Marginal Probability is All You Need?

Attention mechanisms are a central property of cognitive systems allowing them to selectively deploy cognitive resources in a flexible manner. Attention has been long studied in the neurosciences and there are numerous phenomenological…

Machine Learning · Computer Science 2023-04-11 Ryan Singh , Christopher L. Buckley

QUEST: A robust attention formulation using query-modulated spherical attention

The Transformer model architecture has become one of the most widely used in deep learning and the attention mechanism is at its core. The standard attention formulation uses a softmax operation applied to a scaled dot product between query…

Machine Learning · Computer Science 2026-04-02 Hariprasath Govindarajan , Per Sidén , Jacob Roll , Fredrik Lindsten

Learning to Focus: Focal Attention for Selective and Scalable Transformers

Attention is a core component of transformer architecture, whether encoder-only, decoder-only, or encoder-decoder model. However, the standard softmax attention often produces noisy probability distribution, which can impair effective…

Computation and Language · Computer Science 2025-11-11 Dhananjay Ram , Wei Xia , Stefano Soatto

Convolutional Rectangular Attention Module

In this paper, we introduce a novel spatial attention module that can be easily integrated to any convolutional network. This module guides the model to pay attention to the most discriminative part of an image. This enables the model to…

Computer Vision and Pattern Recognition · Computer Science 2025-09-01 Hai-Vy Nguyen , Fabrice Gamboa , Sixin Zhang , Reda Chhaibi , Serge Gratton , Thierry Giaccone

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Xinyang Liu , Dongsheng Wang , Bowei Fang , Miaoge Li , Zhibin Duan , Yishi Xu , Bo Chen , Mingyuan Zhou

BAM: Bottleneck Attention Module

Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective…

Computer Vision and Pattern Recognition · Computer Science 2018-07-19 Jongchan Park , Sanghyun Woo , Joon-Young Lee , In So Kweon

Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are…

Information Retrieval · Computer Science 2022-04-14 Chao Chen , Haoyu Geng , Nianzu Yang , Junchi Yan , Daiyue Xue , Jianping Yu , Xiaokang Yang

Focal Attention Networks: optimising attention for biomedical image segmentation

In recent years, there has been increasing interest to incorporate attention into deep learning architectures for biomedical image segmentation. The modular design of attention mechanisms enables flexible integration into convolutional…

Image and Video Processing · Electrical Eng. & Systems 2021-11-02 Michael Yeung , Leonardo Rundo , Evis Sala , Carola-Bibiane Schönlieb , Guang Yang

Pre-training Attention Mechanisms

Recurrent neural networks with differentiable attention mechanisms have had success in generative and classification tasks. We show that the classification performance of such models can be enhanced by guiding a randomly initialized model…

Machine Learning · Computer Science 2017-12-18 Jack Lindsey

Attention Normalization Impacts Cardinality Generalization in Slot Attention

Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Markus Krimmel , Jan Achterhold , Joerg Stueckler

Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

We formulate an attention mechanism for continuous and ordered sequences that explicitly functions as an alignment model, which serves as the core of many sequence-to-sequence tasks. Standard scaled dot-product attention relies on…

Machine Learning · Computer Science 2025-09-19 Hyungjoon Soh , Junghyo Jo

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios…

Computation and Language · Computer Science 2021-06-22 Hongyu Gong , Yun Tang , Juan Pino , Xian Li

Stochastic Approximation Cut Algorithm for Inference in Modularized Bayesian Models

Bayesian modelling enables us to accommodate complex forms of data and make a comprehensive inference, but the effect of partial misspecification of the model is a concern. One approach in this setting is to modularize the model, and…

Methodology · Statistics 2026-03-18 Yang Liu , Robert J. B. Goudie

Sinkformers: Transformers with Doubly Stochastic Attention

Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise…

Machine Learning · Computer Science 2022-01-25 Michael E. Sander , Pierre Ablin , Mathieu Blondel , Gabriel Peyré