Related papers: Optimizing Vision Transformers with Data-Free Know…

Knowledge Distillation in Vision Transformers: A Critical Review

In Natural Language Processing (NLP), Transformers have already revolutionized the field by utilizing an attention-based encoder-decoder model. Recently, some pioneering works have employed Transformer-like architectures in Computer Vision…

Computer Vision and Pattern Recognition · Computer Science 2024-02-13 Gousia Habib , Tausifa Jan Saleem , Brejesh Lall

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Harsh Rangwani , Pradipto Mondal , Mayank Mishra , Ashish Ramayee Asokan , R. Venkatesh Babu

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Khawar Islam

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

In this paper, we tackle a new problem: how to transfer knowledge from the pre-trained cumbersome yet well-performed CNN-based model to learn a compact Vision Transformer (ViT)-based model while maintaining its learning capacity? Due to the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Xu Zheng , Yunhao Luo , Pengyuan Zhou , Lin Wang

Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation

In the past few years, transformers have achieved promising performances on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds their from being deployed on edge…

Computer Vision and Pattern Recognition · Computer Science 2022-06-03 Zhiwei Hao , Jianyuan Guo , Ding Jia , Kai Han , Yehui Tang , Chao Zhang , Han Hu , Yunhe Wang

Unified Visual Transformer Compression

Vision transformers (ViTs) have gained popularity recently. Even without customized image operators such as convolutions, ViTs can yield competitive performance when properly trained on massive data. However, the computational overhead of…

Machine Learning · Computer Science 2022-03-17 Shixing Yu , Tianlong Chen , Jiayi Shen , Huan Yuan , Jianchao Tan , Sen Yang , Ji Liu , Zhangyang Wang

HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification

Vision Transformers (ViTs) have achieved significant advancement in computer vision tasks due to their powerful modeling capacity. However, their performance notably degrades when trained with insufficient data due to lack of inherent…

Image and Video Processing · Electrical Eng. & Systems 2025-03-04 Omar S. EL-Assiouti , Ghada Hamed , Dina Khattab , Hala M. Ebied

Attention Distillation: self-supervised vision transformer students need more guidance

Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Kai Wang , Fei Yang , Joost van de Weijer

Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Thanh Thi Nguyen , Campbell Wilson , Janis Dalins

Self-Distilled Vision Transformer for Domain Generalization

In the recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG…

Computer Vision and Pattern Recognition · Computer Science 2022-10-06 Maryam Sultana , Muzammal Naseer , Muhammad Haris Khan , Salman Khan , Fahad Shahbaz Khan

Do Vision Transformers See Like Convolutional Neural Networks?

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Maithra Raghu , Thomas Unterthiner , Simon Kornblith , Chiyuan Zhang , Alexey Dosovitskiy

Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation

Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, possibly due to a lack of local inductive bias…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Ibrahim Batuhan Akkaya , Senthilkumar S. Kathiresan , Elahe Arani , Bahram Zonooz

Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition

This paper presents a study on improving human action recognition through the utilization of knowledge distillation, and the combination of CNN and ViT models. The research aims to enhance the performance and efficiency of smaller student…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Hamid Ahmadabadi , Omid Nejati Manzari , Ahmad Ayatollahi

Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge

This paper discusses four facets of the Knowledge Distillation (KD) process for Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures, particularly when executed on edge devices with constrained processing…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 John Violos , Symeon Papadopoulos , Ioannis Kompatsiaris

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Lorenzo Papa , Paolo Russo , Irene Amerini , Luping Zhou

DeepViT: Towards Deeper Vision Transformer

Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Daquan Zhou , Bingyi Kang , Xiaojie Jin , Linjie Yang , Xiaochen Lian , Zihang Jiang , Qibin Hou , Jiashi Feng

Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

While feature-based knowledge distillation has proven highly effective for compressing CNNs, these techniques unexpectedly fail when applied to Vision Transformers (ViTs), often performing worse than simple logit-based distillation. We…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Huiyuan Tian , Bonan Xu , Shijian Li

Vision-TTT: Efficient and Expressive Visual Representation Learning with Test-Time Training

Learning efficient and expressive visual representation has long been the pursuit of computer vision research. While Vision Transformers (ViTs) gradually replace traditional Convolutional Neural Networks (CNNs) as more scalable vision…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Quan Kong , Yanru Xiao , Yuhao Shen , Cong Wang

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Md Sohag Mia , Abu Bakor Hayat Arnob , Abdu Naim , Abdullah Al Bary Voban , Md Shariful Islam

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

Computational Pathology (CPATH) systems have the potential to automate diagnostic tasks. However, the artifacts on the digitized histological glass slides, known as Whole Slide Images (WSIs), may hamper the overall performance of CPATH…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Neel Kanwal , Trygve Eftestol , Farbod Khoraminia , Tahlita CM Zuiverloon , Kjersti Engan