Related papers: Image Transformer

Revisiting Transformers with Insights from Image Filtering and Boosting

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Laziz U. Abdullaev , Maksim Tkachenko , Tan M. Nguyen

Improved Transformer for High-Resolution GANs

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Long Zhao , Zizhao Zhang , Ting Chen , Dimitris N. Metaxas , Han Zhang

Self-attention as an attractor network: transient memories without backpropagation

Transformers are one of the most successful architectures of modern neural networks. At their core there is the so-called attention mechanism, which recently interested the physics community as it can be written as the derivative of an…

Machine Learning · Computer Science 2024-09-25 Francesco D'Amico , Matteo Negri

Transformers predicting the future. Applying attention in next-frame and time series forecasting

Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences. However, with the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms…

Machine Learning · Computer Science 2021-08-19 Radostin Cholakov , Todor Kolev

Self-Attention with Relative Position Representations

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly…

Computation and Language · Computer Science 2018-04-16 Peter Shaw , Jakob Uszkoreit , Ashish Vaswani

Accurate Image Restoration with Attention Retractable Transformer

Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit…

Computer Vision and Pattern Recognition · Computer Science 2023-02-06 Jiale Zhang , Yulun Zhang , Jinjin Gu , Yongbing Zhang , Linghe Kong , Xin Yuan

A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Enabling image generation models to be spatially controlled is an important area of research, empowering users to better generate images according to their own fine-grained specifications via e.g. edge maps, poses. Although this task has…

Computer Vision and Pattern Recognition · Computer Science 2025-11-05 Guoxuan Xia , Harleen Hanspal , Petru-Daniel Tudosiu , Shifeng Zhang , Sarah Parisot

Linear Log-Normal Attention with Unbiased Concentration

Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This…

Machine Learning · Computer Science 2024-02-27 Yury Nahshan , Joseph Kampeas , Emir Haleva

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling

In this paper, we describe the use of recurrent neural networks to capture sequential information from the self-attention representations to improve the Transformers. Although self-attention mechanism provides a means to exploit long…

Computation and Language · Computer Science 2021-04-06 Tze Yuang Chong , Xuyang Wang , Lin Yang , Junjie Wang

Restormer: Efficient Transformer for High-Resolution Image Restoration

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Syed Waqas Zamir , Aditya Arora , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Ming-Hsuan Yang

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Yunhe Gao , Mu Zhou , Dimitris Metaxas

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

Progressive Pose Attention Transfer for Person Image Generation

This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. The generator of the network comprises a sequence of Pose-Attentional Transfer Blocks that each…

Computer Vision and Pattern Recognition · Computer Science 2019-05-14 Zhen Zhu , Tengteng Huang , Baoguang Shi , Miao Yu , Bofei Wang , Xiang Bai

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization…

Computer Vision and Pattern Recognition · Computer Science 2021-09-16 Zihang Dai , Hanxiao Liu , Quoc V. Le , Mingxing Tan

Exploring Self-attention for Image Recognition

Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Hengshuang Zhao , Jiaya Jia , Vladlen Koltun

Transformer-based Image Generation from Scene Graphs

Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image. Previous approaches are based on the combination of graph convolutional networks and adversarial methods for…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Renato Sortino , Simone Palazzo , Concetto Spampinato

T-former: An Efficient Transformer for Image Inpainting

Benefiting from powerful convolutional neural networks (CNNs), learning-based image inpainting methods have made significant breakthroughs over the years. However, some nature of CNNs (e.g. local prior, spatially shared parameters) limit…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Ye Deng , Siqi Hui , Sanping Zhou , Deyu Meng , Jinjun Wang

Local-to-Global Self-Attention in Vision Transformers

Transformers have demonstrated great potential in computer vision tasks. To avoid dense computations of self-attentions in high-resolution visual data, some recent Transformer models adopt a hierarchical design, where self-attentions are…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Jinpeng Li , Yichao Yan , Shengcai Liao , Xiaokang Yang , Ling Shao

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov