Related papers: Reversible Vision Transformers

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Yuchen Duan , Weiyun Wang , Zhe Chen , Xizhou Zhu , Lewei Lu , Tong Lu , Yu Qiao , Hongsheng Li , Jifeng Dai , Wenhai Wang

Image Recognition with Online Lightweight Vision Transformer: A Survey

The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Zherui Zhang , Rongtao Xu , Jie Zhou , Changwei Wang , Xingtian Pei , Wenhao Xu , Jiguang Zhang , Li Guo , Longxiang Gao , Wenbo Xu , Shibiao Xu

Transformers in Vision: A Survey

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies…

Computer Vision and Pattern Recognition · Computer Science 2022-01-20 Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the…

Computer Vision and Pattern Recognition · Computer Science 2019-10-25 Tristan Hascoet , Quentin Febvre , Yasuo Ariki , Tetsuya Takiguchi

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for natural language processing, they require orders of magnitude more training data, GPU memory, and computations in order to compete with convolutional neural networks…

Computer Vision and Pattern Recognition · Computer Science 2021-10-04 Pranav Jeevan , Amit Sethi

ResT: An Efficient Transformer for Visual Recognition

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition. Unlike existing Transformer methods, which employ standard Transformer blocks to tackle…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Qinglong Zhang , Yubin Yang

Improving the Efficiency of Transformers for Resource-Constrained Devices

Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation…

Machine Learning · Computer Science 2021-07-01 Hamid Tabani , Ajay Balasubramaniam , Shabbir Marzban , Elahe Arani , Bahram Zonooz

Recurrent Vision Transformers for Object Detection with Event Cameras

We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras. Event cameras provide visual information with sub-millisecond latency at a high-dynamic range and with strong robustness against…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Mathias Gehrig , Davide Scaramuzza

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

Vision Transformers achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are…

Computer Vision and Pattern Recognition · Computer Science 2023-08-28 Matthew Dutson , Yin Li , Mohit Gupta

Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions

The tradeoff between reconstruction quality and compute required for video super-resolution (VSR) remains a formidable challenge in its adoption for deployment on resource-constrained edge devices. While transformer-based VSR models have…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Kavitha Viswanathan , Shashwat Pathak , Piyush Bharambe , Harsh Choudhary , Amit Sethi

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation

One of the key drawbacks of 3D convolutional neural networks for segmentation is their memory footprint, which necessitates compromises in the network architecture in order to fit into a given memory budget. Motivated by the RevNet for…

Computer Vision and Pattern Recognition · Computer Science 2019-06-21 Robin Brügger , Christian F. Baumgartner , Ender Konukoglu

Vision Transformer Computation and Resilience for Dynamic Inference

State-of-the-art deep learning models for computer vision tasks are based on the transformer architecture and often deployed in real-time applications. In this scenario, the resources available for every inference can vary, so it is useful…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Kavya Sreedhar , Jason Clemons , Rangharajan Venkatesan , Stephen W. Keckler , Mark Horowitz

Efficiency 360: Efficient Vision Transformers

Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains. In this paper, we talk about the efficiency of transformers in terms of memory (the number of parameters),…

Computer Vision and Pattern Recognition · Computer Science 2023-02-27 Badri N. Patro , Vijay Srinivas Agneeswaran

Visformer: The Vision-friendly Transformer

The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Zhengsu Chen , Lingxi Xie , Jianwei Niu , Xuefeng Liu , Longhui Wei , Qi Tian

Vision Transformers for Dense Prediction

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into…

Computer Vision and Pattern Recognition · Computer Science 2021-03-26 René Ranftl , Alexey Bochkovskiy , Vladlen Koltun

ResT V2: Simpler, Faster and Stronger

This paper proposes ResTv2, a simpler, faster, and stronger multi-scale vision Transformer for visual recognition. ResTv2 simplifies the EMSA structure in ResTv1 (i.e., eliminating the multi-head interaction part) and employs an upsample…

Computer Vision and Pattern Recognition · Computer Science 2022-09-28 Qing-Long Zhang , Yu-Bin Yang

Super Vision Transformer

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Mingbao Lin , Mengzhao Chen , Yuxin Zhang , Chunhua Shen , Rongrong Ji , Liujuan Cao

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Shoufa Chen , Chongjian Ge , Zhan Tong , Jiangliu Wang , Yibing Song , Jue Wang , Ping Luo

Understanding Transformer-based Vision Models through Inversion

Understanding the mechanisms underlying deep neural networks remains a fundamental challenge in machine learning and computer vision. One promising, yet only preliminarily explored approach, is feature inversion, which attempts to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Jan Rathjens , Shirin Reyhanian , David Kappel , Laurenz Wiskott