Related papers: EATFormer: Improving Vision Transformer Inspired b…

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model

Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-20 Jiangning Zhang , Chao Xu , Jian Li , Wenzhou Chen , Yabiao Wang , Ying Tai , Shuo Chen , Chengjie Wang , Feiyue Huang , Yong Liu

Improved EATFormer: A Vision Transformer for Medical Image Classification

The accurate analysis of medical images is vital for diagnosing and predicting medical conditions. Traditional approaches relying on radiologists and clinicians suffer from inconsistencies and missed diagnoses. Computer-aided diagnosis…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Yulong Shisu , Susano Mingwin , Yongshuai Wanwag , Zengqiang Chenso , Sunshin Huing

AutoFormer: Searching Transformers for Visual Recognition

Recently, pure transformer-based models have shown great potentials for vision tasks such as image classification and detection. However, the design of transformer networks is challenging. It has been observed that the depth, embedding…

Computer Vision and Pattern Recognition · Computer Science 2021-07-02 Minghao Chen , Houwen Peng , Jianlong Fu , Haibin Ling

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

Vision Transformers (ViT)s have recently become popular due to their outstanding modeling capabilities, in particular for capturing long-range information, and scalability to dataset and model sizes which has led to state-of-the-art…

Image and Video Processing · Electrical Eng. & Systems 2022-04-06 Ali Hatamizadeh , Ziyue Xu , Dong Yang , Wenqi Li , Holger Roth , Daguang Xu

Masked autoencoders are effective solution to transformer data-hungry

Vision Transformers (ViTs) outperforms convolutional neural networks (CNNs) in several vision tasks with its global modeling capabilities. However, ViT lacks the inductive bias inherent to convolution making it require a large amount of…

Computer Vision and Pattern Recognition · Computer Science 2023-01-11 Jiawei Mao , Honggu Zhou , Xuesong Yin , Yuanqi Chang. Binling Nie. Rui Xu

Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers

This research introduces an innovative method for Traffic Sign Recognition (TSR) by leveraging deep learning techniques, with a particular emphasis on Vision Transformers. TSR holds a vital role in advancing driver assistance systems and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Susano Mingwin , Yulong Shisu , Yongshuai Wanwag , Sunshin Huing

MP-Former: Mask-Piloted Transformer for Image Segmentation

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation. The improvement is based on our observation that Mask2Former suffers from inconsistent mask predictions between consecutive decoder…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Hao Zhang , Feng Li , Huaizhe Xu , Shijia Huang , Shilong Liu , Lionel M. Ni , Lei Zhang

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Shoufa Chen , Chongjian Ge , Zhan Tong , Jiangliu Wang , Yibing Song , Jue Wang , Ping Luo

EA-ViT: Efficient Adaptation for Elastic Vision Transformer

Vision Transformers (ViTs) have emerged as a foundational model in computer vision, excelling in generalization and adaptation to downstream tasks. However, deploying ViTs to support diverse resource constraints typically requires…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Chen Zhu , Wangbo Zhao , Huiwen Zhang , Samir Khaki , Yuhao Zhou , Weidong Tang , Shuo Wang , Zhihang Yuan , Yuzhang Shang , Xiaojiang Peng , Kai Wang , Dawei Yang

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

In this work, we present Eformer - Edge enhancement based transformer, a novel architecture that builds an encoder-decoder network using transformer blocks for medical image denoising. Non-overlapping window-based self-attention is used in…

Image and Video Processing · Electrical Eng. & Systems 2021-11-10 Achleshwar Luthra , Harsh Sulakhe , Tanish Mittal , Abhishek Iyer , Santosh Yadav

Visformer: The Vision-friendly Transformer

The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Zhengsu Chen , Lingxi Xie , Jianwei Niu , Xuefeng Liu , Longhui Wei , Qi Tian

ResFormer: Scaling ViTs with Multi-Resolution Training

Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Rui Tian , Zuxuan Wu , Qi Dai , Han Hu , Yu Qiao , Yu-Gang Jiang

Scale-Aware Modulation Meet Transformer

This paper presents a new vision Transformer, Scale-Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. The proposed Scale-Aware Modulation…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Weifeng Lin , Ziheng Wu , Jiayu Chen , Jun Huang , Lianwen Jin

BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient Medical Image Segmentation

Objective: Transformers, born to remedy the inadequate receptive fields of CNNs, have drawn explosive attention recently. However, the daunting computational complexity of global representation learning, together with rigid window…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Xian Lin , Li Yu , Kwang-Ting Cheng , Zengqiang Yan

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

It is a challenging task to learn discriminative representation from images and videos, due to large local redundancy and complex global dependency in these visual data. Convolution neural networks (CNNs) and vision transformers (ViTs) have…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Kunchang Li , Yali Wang , Junhao Zhang , Peng Gao , Guanglu Song , Yu Liu , Hongsheng Li , Yu Qiao

LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons

Vision Transformers have been tremendously successful in computer vision tasks. However, their large computational, memory, and energy demands are a challenge for edge inference on FPGAs -- a field that has seen a recent surge in demand. We…

Machine Learning · Computer Science 2026-02-09 Shashank Nag , Alan T. L. Bacellar , Zachary Susskind , Anshul Jha , Logan Liberty , Aishwarya Sivakumar , Eugene B. John , Krishnan Kailas , Priscila M. V. Lima , Neeraja J. Yadwadkar , Felipe M. G. Franca , Lizy K. John

How to Train Vision Transformer on Small-scale Datasets?

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Hanan Gani , Muzammal Naseer , Mohammad Yaqub

ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer

The paper proposes an efficient structure for enhancing the performance of mobile-friendly vision transformer with small computational overhead. The vision transformer (ViT) is very attractive in that it reaches outperforming results in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Gyeongdong Yang , Yungwook Kwon , Hyunjin Kim

MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation

ViTs are often too computationally expensive to be fitted onto real-world resource-constrained devices, due to (1) their quadratically increased complexity with the number of input tokens and (2) their overparameterized self-attention heads…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Zhongzhi Yu , Yonggan Fu , Sicheng Li , Chaojian Li , Yingyan Lin

AE-Net: Autonomous Evolution Image Fusion Method Inspired by Human Cognitive Mechanism

In order to solve the robustness and generality problems of the image fusion task,inspired by the human brain cognitive mechanism, we propose a robust and general image fusion method with autonomous evolution ability, and is therefore…

Computer Vision and Pattern Recognition · Computer Science 2020-07-20 Aiqing Fang , Xinbo Zhao , Jiaqi Yang , Shihao Cao , Yanning Zhang