English
Related papers

Related papers: Toward Transformer-Based Object Detection

200 papers

Convolutional Neural networks (CNN) have been the first choice of paradigm in many computer vision applications. The convolution operation however has a significant weakness which is it only operates on a local neighborhood of pixels, thus…

Computer Vision and Pattern Recognition · Computer Science 2022-06-14 Michael Yang

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical…

Computer Vision and Pattern Recognition · Computer Science 2022-06-13 Yanghao Li , Hanzi Mao , Ross Girshick , Kaiming He

Object detection is a central downstream task used to test if pre-trained network parameters confer benefits, such as improved accuracy or training speed. The complexity of object detection methods can make this benchmarking non-trivial…

Computer Vision and Pattern Recognition · Computer Science 2021-11-23 Yanghao Li , Saining Xie , Xinlei Chen , Piotr Dollar , Kaiming He , Ross Girshick

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Kai Han , Yunhe Wang , Hanting Chen , Xinghao Chen , Jianyuan Guo , Zhenhua Liu , Yehui Tang , An Xiao , Chunjing Xu , Yixing Xu , Zhaohui Yang , Yiman Zhang , Dacheng Tao

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-03 Ju He , Jie-Neng Chen , Shuai Liu , Adam Kortylewski , Cheng Yang , Yutong Bai , Changhu Wang

Transformers exhibit great advantages in handling computer vision tasks. They model image classification tasks by utilizing a multi-head attention mechanism to process a series of patches consisting of split images. However, for complex…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Haichao Zhang , Kuangrong Hao , Witold Pedrycz , Lei Gao , Xuesong Tang , Bing Wei

Transformers have been widely used in numerous vision problems especially for visual recognition and detection. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

Existing visual change detectors usually adopt CNNs or Transformers for feature representation learning and focus on learning effective representation for the changed regions between images. Although good performance can be obtained by…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Bo Jiang , Zitian Wang , Xixi Wang , Ziyan Zhang , Lan Chen , Xiao Wang , Bin Luo

Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

Convolutional Neural Networks (CNNs), architectures consisting of convolutional layers, have been the standard choice in vision tasks. Recent studies have shown that Vision Transformers (VTs), architectures based on self-attention modules,…

Computer Vision and Pattern Recognition · Computer Science 2022-01-24 Kishaan Jeeveswaran , Senthilkumar Kathiresan , Arnav Varma , Omar Magdy , Bahram Zonooz , Elahe Arani

Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Yang Liu , Yao Zhang , Yixin Wang , Feng Hou , Jin Yuan , Jiang Tian , Yang Zhang , Zhongchao Shi , Jianping Fan , Zhiqiang He

Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision Transformers (ViTs), inspired by models originally…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Dimitrios N. Vlachogiannis , Dimitrios A. Koutsomitropoulos

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Maithra Raghu , Thomas Unterthiner , Simon Kornblith , Chiyuan Zhang , Alexey Dosovitskiy

The core for tackling the fine-grained visual categorization (FGVC) is to learn subtle yet discriminative features. Most previous works achieve this by explicitly selecting the discriminative parts or integrating the attention mechanism via…

Computer Vision and Pattern Recognition · Computer Science 2022-03-02 Jun Wang , Xiaohan Yu , Yongsheng Gao

Vision-transformers (ViTs) and large-scale convolution-neural-networks (CNNs) have reshaped computer vision through pretrained feature representations that enable strong transfer learning for diverse tasks. However, their efficiency as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Alon Kaya , Igal Bilik , Inna Stainvas

Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain…

Computer Vision and Pattern Recognition · Computer Science 2024-06-19 Duowang Zhu , Xiaohu Huang , Haiyan Huang , Zhenfeng Shao , Qimin Cheng

Recently, several Vision Transformer (ViT) based methods have been proposed for Fine-Grained Visual Classification (FGVC).These methods significantly surpass existing CNN-based ones, demonstrating the effectiveness of ViT in FGVC…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Zi-Chao Zhang , Zhen-Duo Chen , Yongxin Wang , Xin Luo , Xin-Shun Xu

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional…

Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Li Yuan , Yunpeng Chen , Tao Wang , Weihao Yu , Yujun Shi , Zihang Jiang , Francis EH Tay , Jiashi Feng , Shuicheng Yan

Recently, Vision Transformers (ViTs) have achieved unprecedented effectiveness in the general domain of image classification. Nonetheless, these models remain underexplored in the field of deepfake detection, given their lower performance…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Dat Nguyen , Marcella Astrid , Enjie Ghorbel , Djamila Aouada
‹ Prev 1 2 3 10 Next ›