Related papers: Finding Differences Between Transformers and ConvN…

Are Transformers More Robust Than CNNs?

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions…

Computer Vision and Pattern Recognition · Computer Science 2021-11-11 Yutong Bai , Jieru Mei , Alan Yuille , Cihang Xie

An Impartial Take to the CNN vs Transformer Robustness Contest

Following the surge of popularity of Transformers in Computer Vision, several studies have attempted to determine whether they could be more robust to distribution shifts and provide better uncertainty estimates than Convolutional Neural…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Francesco Pinto , Philip H. S. Torr , Puneet K. Dokania

ConvNets vs. Transformers: Whose Visual Representations are More Transferable?

Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets. However, although Transformer-based backbones have achieved much progress on ImageNet…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Hong-Yu Zhou , Chixiang Lu , Sibei Yang , Yizhou Yu

Understanding Robustness of Transformers for Image Classification

Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Srinadh Bhojanapalli , Ayan Chakrabarti , Daniel Glasner , Daliang Li , Thomas Unterthiner , Andreas Veit

Curved Representation Space of Vision Transformers

Neural networks with self-attention (a.k.a. Transformers) like ViT and Swin have emerged as a better alternative to traditional convolutional neural networks (CNNs). However, our understanding of how the new architecture works is still…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Juyeop Kim , Junha Park , Songkuk Kim , Jong-Seok Lee

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

ConvNets and Imagenet have driven the recent success of deep learning for image classification. However, the marked slowdown in performance improvement combined with the lack of robustness of neural networks to adversarial examples and…

Machine Learning · Computer Science 2018-07-23 Pierre Stock , Moustapha Cisse

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Stefan Thalhammer , Jean-Baptiste Weibel , Markus Vincze , Jose Garcia-Rodriguez

Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

In order to gain insights about the decision-making of different visual recognition backbones, we propose two methodologies, sub-explanation counting and cross-testing, that systematically applies deep explanation algorithms on a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Mingqi Jiang , Saeed Khorram , Li Fuxin

Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective

The Vision Transformer has emerged as a powerful tool for image classification tasks, surpassing the performance of convolutional neural networks (CNNs). Recently, many researchers have attempted to understand the robustness of Transformers…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Gihyun Kim , Juyeop Kim , Jong-Seok Lee

Can CNNs Be More Robust Than Transformers?

The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade. Specifically, in terms of robustness on out-of-distribution samples, recent research finds…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Zeyu Wang , Yutong Bai , Yuyin Zhou , Cihang Xie

Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis

This paper investigates the effectiveness of self-supervised pre-trained vision transformers (ViTs) compared to supervised pre-trained ViTs and conventional neural networks (ConvNets) for detecting facial deepfake images and videos. It…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Huy H. Nguyen , Junichi Yamagishi , Isao Echizen

Does Robustness on ImageNet Transfer to Downstream Tasks?

As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Yutaro Yamada , Mayu Otani

A Comparison for Anti-noise Robustness of Deep Learning Classification Methods on a Tiny Object Image Dataset: from Convolutional Neural Network to Visual Transformer and Performer

Image classification has achieved unprecedented advance with the the rapid development of deep learning. However, the classification of tiny object images is still not well investigated. In this paper, we first briefly review the…

Computer Vision and Pattern Recognition · Computer Science 2021-06-10 Ao Chen , Chen Li , Haoyuan Chen , Hechen Yang , Peng Zhao , Weiming Hu , Wanli Liu , Shuojia Zou , Marcin Grzegorzek

On Robustness and Transferability of Convolutional Neural Networks

Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. However, several recent breakthroughs in transfer learning suggest that these networks can cope with severe distribution shifts…

Computer Vision and Pattern Recognition · Computer Science 2021-03-24 Josip Djolonga , Jessica Yung , Michael Tschannen , Rob Romijnders , Lucas Beyer , Alexander Kolesnikov , Joan Puigcerver , Matthias Minderer , Alexander D'Amour , Dan Moldovan , Sylvain Gelly , Neil Houlsby , Xiaohua Zhai , Mario Lucic

Adversarially robust deepfake media detection using fused convolutional neural network predictions

Deepfakes are synthetically generated images, videos or audios, which fraudsters use to manipulate legitimate information. Current deepfake detection systems struggle against unseen data. To address this, we employ three different deep…

Computer Vision and Pattern Recognition · Computer Science 2021-02-12 Sohail Ahmed Khan , Alessandro Artusi , Hang Dai

Generalisation in humans and deep neural networks

We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Robert Geirhos , Carlos R. Medina Temme , Jonas Rauber , Heiko H. Schütt , Matthias Bethge , Felix A. Wichmann

Locally Scale-Invariant Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) have shown excellent results on many visual classification tasks. With the exception of ImageNet, these datasets are carefully crafted such that objects are well-aligned at similar scales. Naturally,…

Computer Vision and Pattern Recognition · Computer Science 2014-12-17 Angjoo Kanazawa , Abhishek Sharma , David Jacobs

Classifying Deepfakes Using Swin Transformers

The proliferation of deepfake technology poses significant challenges to the authenticity and trustworthiness of digital media, necessitating the development of robust detection methods. This study explores the application of Swin…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Aprille J. Xi , Eason Chen

Hands-on Evaluation of Visual Transformers for Object Recognition and Detection

Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision Transformers (ViTs), inspired by models originally…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Dimitrios N. Vlachogiannis , Dimitrios A. Koutsomitropoulos

Efficient Training of Visual Transformers with Small Datasets

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Yahui Liu , Enver Sangineto , Wei Bi , Nicu Sebe , Bruno Lepri , Marco De Nadai