Related papers: TransAxx: Efficient Transformers with Approximate …

Searching for Efficient Multi-Stage Vision Transformers

Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted…

Computer Vision and Pattern Recognition · Computer Science 2021-09-03 Yi-Lun Liao , Sertac Karaman , Vivienne Sze

AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

Current state-of-the-art employs approximate multipliers to address the highly increased power demands of DNN accelerators. However, evaluating the accuracy of approximate DNNs is cumbersome due to the lack of adequate support for…

Machine Learning · Computer Science 2022-10-13 Dimitrios Danopoulos , Georgios Zervakis , Kostas Siozios , Dimitrios Soudris , Jörg Henkel

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for natural language processing, they require orders of magnitude more training data, GPU memory, and computations in order to compete with convolutional neural networks…

Computer Vision and Pattern Recognition · Computer Science 2021-10-04 Pranav Jeevan , Amit Sethi

Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high…

Machine Learning · Computer Science 2024-05-02 Dayou Du , Gu Gong , Xiaowen Chu

Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies

In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation. Unlike convolutional neural networks (CNNs), which…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Shaibal Saha , Lanyu Xu

X-ViT: High Performance Linear Vision Transformer without Softmax

Vision transformers have become one of the most important models for computer vision tasks. Although they outperform prior works, they require heavy computational resources on a scale that is quadratic to the number of tokens, $N$. This is…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Jeonggeun Song , Heung-Chang Lee

TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing

Due to the high computation overhead of Vision Transformers (ViTs), In-memory Computing architectures are being researched towards energy-efficient deployment in edge-computing scenarios. Prior works have proposed efficient…

Artificial Intelligence · Computer Science 2024-08-26 Abhishek Moitra , Abhiroop Bhattacharjee , Youngeun Kim , Priyadarshini Panda

Refining Datapath for Microscaling ViTs

Vision Transformers (ViTs) leverage the transformer architecture to effectively capture global context, demonstrating strong performance in computer vision tasks. A major challenge in ViT hardware acceleration is that the model family…

Hardware Architecture · Computer Science 2025-06-17 Can Xiao , Jianyi Cheng , Aaron Zhao

ConcatPlexer: Additional Dim1 Batching for Faster ViTs

Transformers have demonstrated tremendous success not only in the natural language processing (NLP) domain but also the field of computer vision, igniting various creative approaches and applications. Yet, the superior performance and…

Computer Vision and Pattern Recognition · Computer Science 2024-02-01 Donghoon Han , Seunghyeon Seo , Donghyeon Jeon , Jiho Jang , Chaerin Kong , Nojun Kwak

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Md Sohag Mia , Abu Bakor Hayat Arnob , Abdu Naim , Abdullah Al Bary Voban , Md Shariful Islam

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Jinqi Xiao , Miao Yin , Yu Gong , Xiao Zang , Jian Ren , Bo Yuan

A Comparative Study of Vision Transformers and CNNs for Few-Shot Rigid Transformation and Fundamental Matrix Estimation

Vision-transformers (ViTs) and large-scale convolution-neural-networks (CNNs) have reshaped computer vision through pretrained feature representations that enable strong transfer learning for diverse tasks. However, their efficiency as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Alon Kaya , Igal Bilik , Inna Stainvas

TransMix: Attend to Mix for Vision Transformers

Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior…

Computer Vision and Pattern Recognition · Computer Science 2021-11-19 Jie-Neng Chen , Shuyang Sun , Ju He , Philip Torr , Alan Yuille , Song Bai

Convolutional Xformers for Vision

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and…

Computer Vision and Pattern Recognition · Computer Science 2022-01-26 Pranav Jeevan , Amit sethi

Super Vision Transformer

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Mingbao Lin , Mengzhao Chen , Yuxin Zhang , Chunhua Shen , Rongrong Ji , Liujuan Cao

ViTA: A Vision Transformer Inference Accelerator for Edge Applications

Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to…

Hardware Architecture · Computer Science 2023-09-13 Shashank Nag , Gourav Datta , Souvik Kundu , Nitin Chandrachoodan , Peter A. Beerel

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Khawar Islam

Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning

Recently, foundation models based on Vision Transformers (ViTs) have become widely available. However, their fine-tuning process is highly resource-intensive, and it hinders their adoption in several edge or low-energy applications. To this…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Alessio Devoto , Federico Alvetreti , Jary Pomponi , Paolo Di Lorenzo , Pasquale Minervini , Simone Scardapane

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks. However, their practical deployment is hampered by high computational and memory demands. This study…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Feiyang Chen , Ziqian Luo , Lisang Zhou , Xueting Pan , Ying Jiang