Related papers: Efficient Inter-Task Attention for Multitask Trans…

Transformer-based deep imitation learning for dual-arm robot manipulation

Deep imitation learning is promising for solving dexterous manipulation tasks because it does not require an environment model and pre-programmed robot behavior. However, its application to dual-arm manipulation tasks remains challenging.…

Robotics · Computer Science 2025-05-23 Heecheol Kim , Yoshiyuki Ohmura , Yasuo Kuniyoshi

Horizontal and Vertical Attention in Transformers

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Litao Yu , Jian Zhang

Multi-layer Learnable Attention Mask for Multimodal Tasks

While the Self-Attention mechanism in the Transformer model has proven to be effective in many domains, we observe that it is less effective in more diverse settings (e.g. multimodality) due to the varying granularity of each token and the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Wayner Barrios , SouYoung Jin

Multi-manifold Attention for Vision Transformers

Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Dimitrios Konstantinidis , Ilias Papastratis , Kosmas Dimitropoulos , Petros Daras

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as…

Computation and Language · Computer Science 2022-05-17 Gerard Sant , Gerard I. Gállego , Belen Alastruey , Marta R. Costa-Jussà

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Zhuofan Xia , Xuran Pan , Shiji Song , Li Erran Li , Gao Huang

Less is More: Pay Less Attention in Vision Transformers

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Zizheng Pan , Bohan Zhuang , Haoyu He , Jing Liu , Jianfei Cai

FLatten Transformer: Vision Transformer using Focused Linear Attention

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Dongchen Han , Xuran Pan , Yizeng Han , Shiji Song , Gao Huang

Multi-Task Time Series Forecasting With Shared Attention

Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the…

Machine Learning · Computer Science 2021-01-26 Zekai Chen , Jiaze E , Xiao Zhang , Hao Sheng , Xiuzheng Cheng

Couplformer:Rethinking Vision Transformer with Coupling Attention Map

With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a…

Computer Vision and Pattern Recognition · Computer Science 2021-12-13 Hai Lan , Xihao Wang , Xian Wei

A Multiscale Visualization of Attention in the Transformer Model

The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model…

Human-Computer Interaction · Computer Science 2019-06-14 Jesse Vig

Multi-Scale Self-Attention for Text Classification

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on…

Computation and Language · Computer Science 2019-12-03 Qipeng Guo , Xipeng Qiu , Pengfei Liu , Xiangyang Xue , Zheng Zhang

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP)…

Computation and Language · Computer Science 2019-11-07 Xindian Ma , Peng Zhang , Shuai Zhang , Nan Duan , Yuexian Hou , Dawei Song , Ming Zhou

Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation

Object pose estimation is a long-standing problem in computer vision. Recently, attention-based vision transformer models have achieved state-of-the-art results in many computer vision applications. Exploiting the permutation-invariant…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Arul Selvam Periyasamy , Vladimir Tsaturyan , Sven Behnke

Leaner Transformers: More Heads, Less Depth

Transformers have reshaped machine learning by utilizing attention mechanisms to capture complex patterns in large datasets, leading to significant improvements in performance. This success has contributed to the belief that "bigger means…

Machine Learning · Computer Science 2025-05-28 Hemanth Saratchandran , Damien Teney , Simon Lucey

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Transformer-based models, even though achieving super-human performance on several downstream tasks, are often regarded as a black box and used as a whole. It is still unclear what mechanisms they have learned, especially their core module:…

Computation and Language · Computer Science 2023-10-17 Chong Li , Shaonan Wang , Yunhao Zhang , Jiajun Zhang , Chengqing Zong

Fair Comparison between Efficient Attentions

Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jiuk Hong , Chaehyeon Lee , Soyoun Bang , Heechul Jung

MLP Can Be A Good Transformer Learner

Self-attention mechanism is the key of the Transformer but often criticized for its computation demands. Previous token pruning works motivate their methods from the view of computation redundancy but still need to load the full network and…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Sihao Lin , Pumeng Lyu , Dongrui Liu , Tao Tang , Xiaodan Liang , Andy Song , Xiaojun Chang

Learning Hard Retrieval Decoder Attention for Transformers

The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. The multi-head attention network performs the scaled dot-product attention function in parallel, empowering the model by…

Computation and Language · Computer Science 2021-09-13 Hongfei Xu , Qiuhui Liu , Josef van Genabith , Deyi Xiong

Wide Attention Is The Way Forward For Transformers?

The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building…

Machine Learning · Computer Science 2022-11-10 Jason Ross Brown , Yiren Zhao , Ilia Shumailov , Robert D Mullins