Related papers: Vision Pair Learning: An Efficient Training Framew…

Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block

In recent developments in the field of Computer Vision, a rise is seen in the use of transformer-based architectures. They are surpassing the state-of-the-art set by CNN architectures in accuracy but on the other hand, they are…

Computer Vision and Pattern Recognition · Computer Science 2021-10-12 Durvesh Malpure , Onkar Litake , Rajesh Ingle

Convolutional Dictionary Pair Learning Network for Image Representation Learning

Both the Dictionary Learning (DL) and Convolutional Neural Networks (CNN) are powerful image representation learning systems based on different mechanisms and principles, however whether we can seamlessly integrate them to improve the…

Computer Vision and Pattern Recognition · Computer Science 2020-01-16 Zhao Zhang , Yulin Sun , Yang Wang , Zhengjun Zha , Shuicheng Yan , Meng Wang

Efficient Training of Visual Transformers with Small Datasets

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Yahui Liu , Enver Sangineto , Wei Bi , Nicu Sebe , Bruno Lepri , Marco De Nadai

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Vision-Language Pre-training (VLP) aims to learn multi-modal representations from image-text pairs and serves for downstream vision-language tasks in a fine-tuning fashion. The dominant VLP models adopt a CNN-Transformer architecture, which…

Computer Vision and Pattern Recognition · Computer Science 2021-11-10 Hongwei Xue , Yupan Huang , Bei Liu , Houwen Peng , Jianlong Fu , Houqiang Li , Jiebo Luo

Highly-Efficient Binary Neural Networks for Visual Place Recognition

VPR is a fundamental task for autonomous navigation as it enables a robot to localize itself in the workspace when a known location is detected. Although accuracy is an essential requirement for a VPR technique, computational and energy…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Bruno Ferrarini , Michael Milford , Klaus D. McDonald-Maier , Shoaib Ehsan

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Improving Vision Transformers for Incremental Learning

This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning. Although this recipe only combines existing techniques, developing the combination is not trivial. Firstly, naive application of ViT to…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Pei Yu , Yinpeng Chen , Ying Jin , Zicheng Liu

Crafting a multi-task CNN for viewpoint estimation

Convolutional Neural Networks (CNNs) were recently shown to provide state-of-the-art results for object category viewpoint estimation. However different ways of formulating this problem have been proposed and the competing approaches have…

Computer Vision and Pattern Recognition · Computer Science 2016-09-14 Francisco Massa , Renaud Marlet , Mathieu Aubry

Robust Classification with Convolutional Prototype Learning

Convolutional neural networks (CNNs) have been widely used for image classification. Despite its high accuracies, CNN has been shown to be easily fooled by some adversarial examples, indicating that CNN is not robust enough for pattern…

Computer Vision and Pattern Recognition · Computer Science 2018-05-10 Hong-Ming Yang , Xu-Yao Zhang , Fei Yin , Cheng-Lin Liu

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers

Transformers are popular neural network models that use layers of self-attention and fully-connected nodes with embedded tokens. Vision Transformers (ViT) adapt transformers for image recognition tasks. In order to do this, the images are…

Computer Vision and Pattern Recognition · Computer Science 2023-04-28 Brian Kenji Iwana , Akihiro Kusuda

Do Vision Transformers See Like Convolutional Neural Networks?

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Maithra Raghu , Thomas Unterthiner , Simon Kornblith , Chiyuan Zhang , Alexey Dosovitskiy

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to…

Machine Learning · Computer Science 2022-05-17 Tianxin Tao , Daniele Reda , Michiel van de Panne

Terrain Classification using Transfer Learning on Hyperspectral Images: A Comparative study

A Hyperspectral image contains much more number of channels as compared to a RGB image, hence containing more information about entities within the image. The convolutional neural network (CNN) and the Multi-Layer Perceptron (MLP) have been…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Uphar Singh , Kumar Saurabh , Neelaksh Trehan , Ranjana Vyas , O. P. Vyas

Exploring Synergistic Ensemble Learning: Uniting CNNs, MLP-Mixers, and Vision Transformers to Enhance Image Classification

In recent years, Convolutional Neural Networks (CNNs), MLP-mixers, and Vision Transformers have risen to prominence as leading neural architectures in image classification. Prior research has underscored the distinct advantages of each…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Mk Bashar , Ocean Monjur , Samia Islam , Mohammad Galib Shams , Niamul Quader

Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

In this work we propose a novel joint training method for Visual Place Recognition (VPR), which simultaneously learns a global descriptor and a pair classifier for re-ranking. The pair classifier can predict whether a given pair of images…

Robotics · Computer Science 2025-03-04 Stephen Hausler , Peyman Moghadam

A Comparative Study of Vision Transformers and CNNs for Few-Shot Rigid Transformation and Fundamental Matrix Estimation

Vision-transformers (ViTs) and large-scale convolution-neural-networks (CNNs) have reshaped computer vision through pretrained feature representations that enable strong transfer learning for diverse tasks. However, their efficiency as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Alon Kaya , Igal Bilik , Inna Stainvas

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, there are still gaps in both performance and computational cost between…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Jianyuan Guo , Kai Han , Han Wu , Yehui Tang , Xinghao Chen , Yunhe Wang , Chang Xu

Agricultural Plantation Classification using Transfer Learning Approach based on CNN

Hyper-spectral images are images captured from a satellite that gives spatial and spectral information of specific region.A Hyper-spectral image contains much more number of channels as compared to a RGB image, hence containing more…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Uphar Singh , Tushar Musale , Ranjana Vyas , O. P. Vyas

Vision Transformer for Contrastive Clustering

Vision Transformer (ViT) has shown its advantages over the convolutional neural network (CNN) with its ability to capture global long-range dependencies for visual representation learning. Besides ViT, contrastive learning is another…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Hua-Bao Ling , Bowen Zhu , Dong Huang , Ding-Hua Chen , Chang-Dong Wang , Jian-Huang Lai

Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation

Navigation and mobility are some of the major problems faced by visually impaired people in their daily lives. Advances in computer vision led to the proposal of some navigation systems. However, most of them require expensive and/or heavy…

Computer Vision and Pattern Recognition · Computer Science 2020-05-12 Fabricio Breve , Carlos Norberto Fischer