Related papers: Affine Self Convolution

Stand-Alone Self-Attention in Vision Models

Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models…

Computer Vision and Pattern Recognition · Computer Science 2019-06-17 Prajit Ramachandran , Niki Parmar , Ashish Vaswani , Irwan Bello , Anselm Levskaya , Jonathon Shlens

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in…

Machine Learning · Computer Science 2025-10-14 Hehe Fan , Yi Yang , Mohan Kankanhalli , Fei Wu

On the Integration of Self-Attention and Convolution

Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Xuran Pan , Chunjiang Ge , Rui Lu , Shiji Song , Guanfu Chen , Zeyi Huang , Gao Huang

Exploring Self-attention for Image Recognition

Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Hengshuang Zhao , Jiaya Jia , Vladlen Koltun

Pay Less Attention with Lightweight and Dynamic Convolutions

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight…

Computation and Language · Computer Science 2019-02-26 Felix Wu , Angela Fan , Alexei Baevski , Yann N. Dauphin , Michael Auli

Attention-based Image Upsampling

Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on…

Computer Vision and Pattern Recognition · Computer Science 2020-12-21 Souvik Kundu , Hesham Mostafa , Sharath Nittur Sridhar , Sairam Sundaresan

Attention Augmented Convolutional Networks

Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information.…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Irwan Bello , Barret Zoph , Ashish Vaswani , Jonathon Shlens , Quoc V. Le

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

In this paper, we detail the relationship between convolutions and self-attention in natural language tasks. We show that relative position embeddings in self-attention layers are equivalent to recently-proposed dynamic lightweight…

Computation and Language · Computer Science 2021-06-11 Tyler A. Chang , Yifan Xu , Weijian Xu , Zhuowen Tu

Evolving Attention with Residual Convolutions

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

X-volution: On the unification of convolution and self-attention

Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships.…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Xuanhong Chen , Hang Wang , Bingbing Ni

A Close Look at Spatial Modeling: From Attention to Convolution

Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism. By revisiting the self-attention responses in Transformers, we empirically observe two…

Computer Vision and Pattern Recognition · Computer Science 2022-12-27 Xu Ma , Huan Wang , Can Qin , Kunpeng Li , Xingchen Zhao , Jie Fu , Yun Fu

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Ashish Vaswani , Prajit Ramachandran , Aravind Srinivas , Niki Parmar , Blake Hechtman , Jonathon Shlens

When Medical Imaging Met Self-Attention: A Love Story That Didn't Quite Work Out

A substantial body of research has focused on developing systems that assist medical professionals during labor-intensive early screening processes, many based on convolutional deep-learning architectures. Recently, multiple studies…

Computer Vision and Pattern Recognition · Computer Science 2024-04-19 Tristan Piater , Niklas Penzel , Gideon Stein , Joachim Denzler

Attentive Group Equivariant Convolutional Networks

Although group convolutional networks are able to learn powerful representations based on symmetry patterns, they lack explicit means to learn meaningful relationships among them (e.g., relative positions and poses). In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 David W. Romero , Erik J. Bekkers , Jakub M. Tomczak , Mark Hoogendoorn

Convolutional Self-Attention Networks

Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend…

Computation and Language · Computer Science 2019-04-08 Baosong Yang , Longyue Wang , Derek Wong , Lidia S. Chao , Zhaopeng Tu

Self-Attention for Audio Super-Resolution

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio…

Sound · Computer Science 2021-08-27 Nathanaël Carraz Rakotonirina

Revisiting Transformers with Insights from Image Filtering and Boosting

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Laziz U. Abdullaev , Maksim Tkachenko , Tan M. Nguyen

Attention Via Convolutional Nearest Neighbors

The shift from Convolutional Neural Networks to Transformers has reshaped computer vision, yet these two architectural families are typically viewed as fundamentally distinct. We argue that convolution and self-attention, despite their…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Mingi Kang , Jeová Farias Sales Rocha Neto

A Lightweight Convolution and Vision Transformer integrated model with Multi-scale Self-attention Mechanism

Vision Transformer (ViT) has prevailed in computer vision tasks due to its strong long-range dependency modelling ability. \textcolor{blue}{However, its large model size and weak local feature modeling ability hinder its application in real…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Yi Zhang , Lingxiao Wei , Bowei Zhang , Ziwei Liu , Kai Yi , Shu Hu

Assessing the Impact of Attention and Self-Attention Mechanisms on the Classification of Skin Lesions

Attention mechanisms have raised significant interest in the research community, since they promise significant improvements in the performance of neural network architectures. However, in any specific problem, we still lack a principled…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Rafael Pedro , Arlindo L. Oliveira