Related papers: Structured Initialization for Vision Transformers

Structured Initialization for Attention in Vision Transformers

The training of vision transformer (ViT) networks on small-scale datasets poses a significant challenge. By contrast, convolutional neural networks (CNNs) have an architectural inductive bias enabling them to perform well on such problems.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Jianqiao Zheng , Xueqian Li , Simon Lucey

Convolutional Initialization for Data-Efficient Vision Transformers

Training vision transformer networks on small datasets poses challenges. In contrast, convolutional neural networks (CNNs) can achieve state-of-the-art performance by leveraging their architectural inductive bias. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Jianqiao Zheng , Xueqian Li , Simon Lucey

How to Train Vision Transformer on Small-scale Datasets?

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Hanan Gani , Muzammal Naseer , Mohammad Yaqub

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

Recently, vision Transformers (ViTs) are developing rapidly and starting to challenge the domination of convolutional neural networks (CNNs) in the realm of computer vision (CV). With the general-purpose Transformer architecture replacing…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Haofei Zhang , Jiarui Duan , Mengqi Xue , Jie Song , Li Sun , Mingli Song

Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation

Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, possibly due to a lack of local inductive bias…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Ibrahim Batuhan Akkaya , Senthilkumar S. Kathiresan , Elahe Arani , Bahram Zonooz

Transformed CNNs: recasting pre-trained convolutional layers with self-attention

Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a…

Machine Learning · Computer Science 2021-06-11 Stéphane d'Ascoli , Levent Sagun , Giulio Biroli , Ari Morcos

Improving Vision Transformers for Incremental Learning

This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning. Although this recipe only combines existing techniques, developing the combination is not trivial. Firstly, naive application of ViT to…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Pei Yu , Yinpeng Chen , Ying Jin , Zicheng Liu

Convolutional Embedding Makes Hierarchical Vision Transformer Stronger

Vision Transformers (ViTs) have recently dominated a range of computer vision tasks, yet it suffers from low training data efficiency and inferior local semantic representation capability without appropriate inductive bias. Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2022-08-02 Cong Wang , Hongmin Xu , Xiong Zhang , Li Wang , Zhitong Zheng , Haifeng Liu

Vision Transformers provably learn spatial structure

Vision Transformers (ViTs) have achieved comparable or superior performance than Convolutional Neural Networks (CNNs) in computer vision. This empirical breakthrough is even more remarkable since, in contrast to CNNs, ViTs do not embed any…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Samy Jelassi , Michael E. Sander , Yuanzhi Li

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets, which is concluded to the lack of inductive bias. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Zhiying Lu , Hongtao Xie , Chuanbin Liu , Yongdong Zhang

Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling

There are two de facto standard architectures in recent computer vision: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Strong inductive biases of convolutions help the model learn sample effectively, but such strong…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Yunsung Lee , Gyuseong Lee , Kwangrok Ryoo , Hyojun Go , Jihye Park , Seungryong Kim

HSViT: Horizontally Scalable Vision Transformer

Due to its deficiency in prior knowledge (inductive bias), Vision Transformer (ViT) requires pre-training on large-scale datasets to perform well. Moreover, the growing layers and parameters in ViT models impede their applicability to…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Chenhao Xu , Chang-Tsun Li , Chee Peng Lim , Douglas Creighton

IBiT: Utilizing Inductive Biases to Create a More Data Efficient Attention Mechanism

In recent years, Transformer-based architectures have become the dominant method for Computer Vision applications. While Transformers are explainable and scale well with dataset size, they lack the inductive biases of Convolutional Neural…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adithya Giri

Training Vision Transformers with Only 2040 Images

Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition. They achieve competitive results with CNNs but the lack of the typical convolutional inductive bias makes them more…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Yun-Hao Cao , Hao Yu , Jianxin Wu

Efficient Training of Visual Transformers with Small Datasets

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Yahui Liu , Enver Sangineto , Wei Bi , Nicu Sebe , Bruno Lepri , Marco De Nadai

Surface Analysis with Vision Transformers

The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Simon Dahan , Logan Z. J. Williams , Abdulah Fawaz , Daniel Rueckert , Emma C. Robinson

Vision Transformers in Precision Agriculture: A Comprehensive Survey

Detecting plant diseases is a crucial aspect of modern agriculture, as it plays a key role in maintaining crop health and increasing overall yield. Traditional approaches, though still valuable, often rely on manual inspection or…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Saber Mehdipour , Seyed Abolghasem Mirroshandel , Seyed Amirhossein Tabatabaei

Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review

Our review explores the comparative analysis between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the domain of image classification, with a particular focus on clothing classification within the e-commerce sector.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Sonia Bbouzidi , Ghazala Hcini , Imen Jdey , Fadoua Drira

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Stéphane d'Ascoli , Hugo Touvron , Matthew Leavitt , Ari Morcos , Giulio Biroli , Levent Sagun

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Ariel N. Lee , Sarah Adel Bargal , Janavi Kasera , Stan Sclaroff , Kate Saenko , Nataniel Ruiz