Related papers: Fine-tuning Image Transformers using Learnable Mem…

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

Memory-Efficient Incremental Learning Through Feature Adaptation

We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes, instead of the images themselves, unlike most existing work. Keeping the much lower-dimensional feature…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 Ahmet Iscen , Jeffrey Zhang , Svetlana Lazebnik , Cordelia Schmid

Memorizing Transformers

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus…

Machine Learning · Computer Science 2022-03-18 Yuhuai Wu , Markus N. Rabe , DeLesley Hutchins , Christian Szegedy

Efficient Visual Transformer by Learnable Token Merging

Self-attention and transformers have been widely used in deep learning. Recent efforts have been devoted to incorporating transformer blocks into different neural architectures, including those with convolutions, leading to various visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Yancheng Wang , Yingzhen Yang

Evolving Attention with Residual Convolutions

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Michael S. Ryoo , AJ Piergiovanni , Anurag Arnab , Mostafa Dehghani , Anelia Angelova

Memory Efficient Continual Learning with Transformers

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is…

Computation and Language · Computer Science 2023-01-16 Beyza Ermis , Giovanni Zappella , Martin Wistuba , Aditya Rawal , Cedric Archambeau

Memory-Efficient Fine-Tuning of Transformers via Token Selection

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially for large transformer-based models, such as LLMs. While existing…

Computation and Language · Computer Science 2025-02-03 Antoine Simoulin , Namyong Park , Xiaoyi Liu , Grey Yang

Continual Learning with Pretrained Backbones by Tuning in the Input Space

The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. This issue is critical in practical supervised learning settings, such as the ones in…

Machine Learning · Computer Science 2023-06-09 Simone Marullo , Matteo Tiezzi , Marco Gori , Stefano Melacci , Tinne Tuytelaars

Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token

Deep learning models generally display catastrophic forgetting when learning new data continuously. Many incremental learning approaches address this problem by reusing data from previous tasks while learning new tasks. However, the direct…

Machine Learning · Computer Science 2024-11-12 Young Jo Choi , Min Kyoon Yoo , Yu Rang Park

Growing a Brain: Fine-Tuning by Increasing Model Capacity

CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset…

Computer Vision and Pattern Recognition · Computer Science 2019-07-19 Yu-Xiong Wang , Deva Ramanan , Martial Hebert

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting. However, there is a tradeoff between the number of…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Thomas De Min , Massimiliano Mancini , Karteek Alahari , Xavier Alameda-Pineda , Elisa Ricci

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-12 Ahmet Iscen , Alireza Fathi , Cordelia Schmid

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Augmenting Self-attention with Persistent Memory

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long…

Machine Learning · Computer Science 2019-07-03 Sainbayar Sukhbaatar , Edouard Grave , Guillaume Lample , Herve Jegou , Armand Joulin

Enhancing Latent Computation in Transformers with Latent Tokens

Augmenting large language models (LLMs) with auxiliary tokens has emerged as a promising strategy for enhancing model performance. In this work, we introduce a lightweight method termed latent tokens; these are dummy tokens that may be…

Machine Learning · Computer Science 2025-05-20 Yuchang Sun , Yanxi Chen , Yaliang Li , Bolin Ding

Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning

Despite the success of transformers on various computer vision tasks, they suffer from excessive memory and computational cost. Some works present dynamic vision transformers to accelerate inference by pruning redundant tokens. A key to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Fengyuan Shi , Limin Wang

Enhancing Transformers Through Conditioned Embedded Tokens

Transformers have transformed modern machine learning, driving breakthroughs in computer vision, natural language processing, and robotics. At the core of their success lies the attention mechanism, which enables the modeling of global…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Hemanth Saratchandran , Simon Lucey

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. Is this actually true? We investigate this question and find that the features and representations…

Machine Learning · Computer Science 2024-11-15 Alexander C. Li , Yuandong Tian , Beidi Chen , Deepak Pathak , Xinlei Chen

Token Turing Machines

We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory…

Machine Learning · Computer Science 2023-04-14 Michael S. Ryoo , Keerthana Gopalakrishnan , Kumara Kahatapitiya , Ted Xiao , Kanishka Rao , Austin Stone , Yao Lu , Julian Ibarz , Anurag Arnab