English
Related papers

Related papers: Structured Initialization for Vision Transformers

200 papers

The training of vision transformer (ViT) networks on small-scale datasets poses a significant challenge. By contrast, convolutional neural networks (CNNs) have an architectural inductive bias enabling them to perform well on such problems.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Jianqiao Zheng , Xueqian Li , Simon Lucey

Training vision transformer networks on small datasets poses challenges. In contrast, convolutional neural networks (CNNs) can achieve state-of-the-art performance by leveraging their architectural inductive bias. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Jianqiao Zheng , Xueqian Li , Simon Lucey

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Hanan Gani , Muzammal Naseer , Mohammad Yaqub

Recently, vision Transformers (ViTs) are developing rapidly and starting to challenge the domination of convolutional neural networks (CNNs) in the realm of computer vision (CV). With the general-purpose Transformer architecture replacing…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Haofei Zhang , Jiarui Duan , Mengqi Xue , Jie Song , Li Sun , Mingli Song

Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, possibly due to a lack of local inductive bias…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Ibrahim Batuhan Akkaya , Senthilkumar S. Kathiresan , Elahe Arani , Bahram Zonooz

Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a…

Machine Learning · Computer Science 2021-06-11 Stéphane d'Ascoli , Levent Sagun , Giulio Biroli , Ari Morcos

This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning. Although this recipe only combines existing techniques, developing the combination is not trivial. Firstly, naive application of ViT to…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Pei Yu , Yinpeng Chen , Ying Jin , Zicheng Liu

Vision Transformers (ViTs) have recently dominated a range of computer vision tasks, yet it suffers from low training data efficiency and inferior local semantic representation capability without appropriate inductive bias. Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2022-08-02 Cong Wang , Hongmin Xu , Xiong Zhang , Li Wang , Zhitong Zheng , Haifeng Liu

Vision Transformers (ViTs) have achieved comparable or superior performance than Convolutional Neural Networks (CNNs) in computer vision. This empirical breakthrough is even more remarkable since, in contrast to CNNs, ViTs do not embed any…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Samy Jelassi , Michael E. Sander , Yuanzhi Li

There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets, which is concluded to the lack of inductive bias. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Zhiying Lu , Hongtao Xie , Chuanbin Liu , Yongdong Zhang

There are two de facto standard architectures in recent computer vision: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Strong inductive biases of convolutions help the model learn sample effectively, but such strong…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Yunsung Lee , Gyuseong Lee , Kwangrok Ryoo , Hyojun Go , Jihye Park , Seungryong Kim

Due to its deficiency in prior knowledge (inductive bias), Vision Transformer (ViT) requires pre-training on large-scale datasets to perform well. Moreover, the growing layers and parameters in ViT models impede their applicability to…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Chenhao Xu , Chang-Tsun Li , Chee Peng Lim , Douglas Creighton

In recent years, Transformer-based architectures have become the dominant method for Computer Vision applications. While Transformers are explainable and scale well with dataset size, they lack the inductive biases of Convolutional Neural…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adithya Giri

Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition. They achieve competitive results with CNNs but the lack of the typical convolutional inductive bias makes them more…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Yun-Hao Cao , Hao Yu , Jianxin Wu

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Yahui Liu , Enver Sangineto , Wei Bi , Nicu Sebe , Bruno Lepri , Marco De Nadai

The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Simon Dahan , Logan Z. J. Williams , Abdulah Fawaz , Daniel Rueckert , Emma C. Robinson

Detecting plant diseases is a crucial aspect of modern agriculture, as it plays a key role in maintaining crop health and increasing overall yield. Traditional approaches, though still valuable, often rely on manual inspection or…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Saber Mehdipour , Seyed Abolghasem Mirroshandel , Seyed Amirhossein Tabatabaei

Our review explores the comparative analysis between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the domain of image classification, with a particular focus on clothing classification within the e-commerce sector.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Sonia Bbouzidi , Ghazala Hcini , Imen Jdey , Fadoua Drira

Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Stéphane d'Ascoli , Hugo Touvron , Matthew Leavitt , Ari Morcos , Giulio Biroli , Levent Sagun

Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Ariel N. Lee , Sarah Adel Bargal , Janavi Kasera , Stan Sclaroff , Kate Saenko , Nataniel Ruiz
‹ Prev 1 2 3 10 Next ›