Related papers: Conditional Self-Attention for Query-based Summari…

Context-Aware Self-Attention Networks

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the…

Computation and Language · Computer Science 2019-02-18 Baosong Yang , Jian Li , Derek Wong , Lidia S. Chao , Xing Wang , Zhaopeng Tu

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using…

Computation and Language · Computer Science 2019-03-27 Tao Shen , Tianyi Zhou , Guodong Long , Jing Jiang , Chengqi Zhang

Self-Attention-Based Contextual Modulation Improves Neural System Identification

Convolutional neural networks (CNNs) have been shown to be state-of-the-art models for visual cortical neurons. Cortical neurons in the primary visual cortex are sensitive to contextual information mediated by extensive horizontal and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Isaac Lin , Tianye Wang , Shang Gao , Shiming Tang , Tai Sing Lee

Grouped self-attention mechanism for a memory-efficient Transformer

Time-series data analysis is important because numerous real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long…

Machine Learning · Computer Science 2022-10-07 Bumjun Jung , Yusuke Mukuta , Tatsuya Harada

Selective Attention: Enhancing Transformer through Principled Context Control

The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same…

Machine Learning · Computer Science 2024-11-21 Xuechen Zhang , Xiangyu Chang , Mingchen Li , Amit Roy-Chowdhury , Jiasi Chen , Samet Oymak

SAMSA: Efficient Transformer for Many Data Modalities

The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on…

Machine Learning · Computer Science 2024-08-20 Minh Lenhat , Viet Anh Nguyen , Khoa Nguyen , Duong Duc Hieu , Dao Huu Hung , Truong Son Hy

Causal Interpretation of Self-Attention in Pre-Trained Transformers

We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The…

Artificial Intelligence · Computer Science 2023-11-01 Raanan Y. Rohekar , Yaniv Gurwicz , Shami Nisimov

SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering

Conversational question answering (CQA) is a novel QA task that requires understanding of dialogue context. Different from traditional single-turn machine reading comprehension (MRC) tasks, CQA includes passage comprehension, coreference…

Computation and Language · Computer Science 2019-01-04 Chenguang Zhu , Michael Zeng , Xuedong Huang

Cross-Modal Self-Attention Network for Referring Image Segmentation

We consider the problem of referring image segmentation. Given an input image and a natural language expression, the goal is to segment the object referred by the language expression in the image. Existing works in this area treat the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Linwei Ye , Mrigank Rochan , Zhi Liu , Yang Wang

Position-Aware Self-Attention based Neural Sequence Labeling

Sequence labeling is a fundamental task in natural language processing and has been widely studied. Recently, RNN-based sequence labeling models have increasingly gained attentions. Despite superior performance achieved by learning the long…

Computation and Language · Computer Science 2021-10-19 Wei Wei , Zanbo Wang , Xianling Mao , Guangyou Zhou , Pan Zhou , Sheng Jiang

Class-Specific Attention (CSA) for Time-Series Classification

Most neural network-based classifiers extract features using several hidden layers and make predictions at the output layer by utilizing these extracted features. We observe that not all features are equally pronounced in all classes; we…

Machine Learning · Computer Science 2022-11-22 Yifan Hao , Huiping Cao , K. Selcuk Candan , Jiefei Liu , Huiying Chen , Ziwei Ma

Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models

Large Language Models (LLMs) possess remarkable generalization capabilities but struggle with multi-task adaptation, particularly in balancing knowledge retention with task-specific specialization. Conventional fine-tuning methods suffer…

Artificial Intelligence · Computer Science 2025-10-21 Dayan Pan , Zhaoyang Fu , Jingyuan Wang , Xiao Han , Yue Zhu , Xiangyu Zhao

Exclusive Self Attention

We introduce exclusive self attention (XSA), a simple modification of self attention (SA) that improves Transformer's sequence modeling performance. The key idea is to constrain attention to capture only information orthogonal to the…

Machine Learning · Computer Science 2026-03-11 Shuangfei Zhai

SAM: A Self-adaptive Attention Module for Context-Aware Recommendation System

Recently, textual information has been proved to play a positive role in recommendation systems. However, most of the existing methods only focus on representation learning of textual information in ratings, while potential selection bias…

Information Retrieval · Computer Science 2021-10-14 Jiabin Liu , Zheng Wei , Zhengpin Li , Xiaojun Mao , Jian Wang , Zhongyu Wei , Qi Zhang

Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems

Self-attentional models are a new paradigm for sequence modelling tasks which differ from common sequence modelling methods, such as recurrence-based and convolution-based sequence learning, in the way that their architecture is only based…

Computation and Language · Computer Science 2019-09-13 Mansour Saffar Mehrjardi , Amine Trabelsi , Osmar R. Zaiane

VQA with Cascade of Self- and Co-Attention Blocks

The use of complex attention modules has improved the performance of the Visual Question Answering (VQA) task. This work aims to learn an improved multi-modal representation through dense interaction of visual and textual modalities. The…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Aakansha Mishra , Ashish Anand , Prithwijit Guha

Contextualized Non-local Neural Networks for Sequence Learning

Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which self-attention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this…

Computation and Language · Computer Science 2018-11-22 Pengfei Liu , Shuaichen Chang , Xuanjing Huang , Jian Tang , Jackie Chi Kit Cheung

Core Context Aware Transformers for Long Context Language Modeling

Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute…

Computation and Language · Computer Science 2025-08-05 Yaofo Chen , Zeng You , Shuhai Zhang , Haokun Li , Yirui Li , Yaowei Wang , Mingkui Tan

A Context-aware Attention Network for Interactive Question Answering

Neural network based sequence-to-sequence models in an encoder-decoder framework have been successfully applied to solve Question Answering (QA) problems, predicting answers from statements and questions. However, almost all previous models…

Computation and Language · Computer Science 2017-09-05 Huayu Li , Martin Renqiang Min , Yong Ge , Asim Kadav

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In…

Machine Learning · Computer Science 2022-04-26 Oren Barkan , Edan Hauon , Avi Caciularu , Ori Katz , Itzik Malkiel , Omri Armstrong , Noam Koenigstein