Related papers: Visual Transformer for Object Detection

Attention Augmented Convolutional Networks

Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information.…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Irwan Bello , Barret Zoph , Ashish Vaswani , Jonathon Shlens , Quoc V. Le

Toward Transformer-Based Object Detection

Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first…

Computer Vision and Pattern Recognition · Computer Science 2020-12-21 Josh Beal , Eric Kim , Eric Tzeng , Dong Huk Park , Andrew Zhai , Dmitry Kislyuk

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

Convolutional Neural Networks (CNNs), architectures consisting of convolutional layers, have been the standard choice in vision tasks. Recent studies have shown that Vision Transformers (VTs), architectures based on self-attention modules,…

Computer Vision and Pattern Recognition · Computer Science 2022-01-24 Kishaan Jeeveswaran , Senthilkumar Kathiresan , Arnav Varma , Omar Magdy , Bahram Zonooz , Elahe Arani

A Survey on Visual Transformer

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Kai Han , Yunhe Wang , Hanting Chen , Xinghao Chen , Jianyuan Guo , Zhenhua Liu , Yehui Tang , An Xiao , Chunjing Xu , Yixing Xu , Zhaohui Yang , Yiman Zhang , Dacheng Tao

CNN-transformer mixed model for object detection

Object detection, one of the three main tasks of computer vision, has been used in various applications. The main process is to use deep neural networks to extract the features of an image and then use the features to identify the class and…

Computer Vision and Pattern Recognition · Computer Science 2022-12-14 Wenshuo Li

Deformable ConvNets v2: More Deformable, Better Results

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Xizhou Zhu , Han Hu , Stephen Lin , Jifeng Dai

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Guanglei Yang , Hao Tang , Mingli Ding , Nicu Sebe , Elisa Ricci

Vision Transformer with Convolutions Architecture Search

Transformers exhibit great advantages in handling computer vision tasks. They model image classification tasks by utilizing a multi-head attention mechanism to process a series of patches consisting of split images. However, for complex…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Haichao Zhang , Kuangrong Hao , Witold Pedrycz , Lei Gao , Xuesong Tang , Bing Wei

ConTNet: Why not use convolution and transformer at the same time?

Although convolutional networks (ConvNets) have enjoyed great success in computer vision (CV), it suffers from capturing global information crucial to dense prediction tasks such as object detection and segmentation. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2021-05-12 Haotian Yan , Zhe Li , Weijian Li , Changhu Wang , Ming Wu , Chuang Zhang

Object Detection with Transformers: A Review

The astounding performance of transformers in natural language processing (NLP) has motivated researchers to explore their applications in computer vision tasks. DEtection TRansformer (DETR) introduces transformers to object detection tasks…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

DETR++: Taming Your Multi-Scale Detection Transformer

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12]. With the sweeping reform of Transformers [27] in natural language processing, Carion et al. [2]…

Computer Vision and Pattern Recognition · Computer Science 2022-06-08 Chi Zhang , Lijuan Liu , Xiaoxue Zang , Frederick Liu , Hao Zhang , Xinying Song , Jindong Chen

Focal Self-attention for Local-Global Interactions in Vision Transformers

Recently, Vision Transformer and its variants have shown great promise on various computer vision tasks. The ability of capturing short- and long-range visual dependencies through self-attention is arguably the main source for the success.…

Computer Vision and Pattern Recognition · Computer Science 2021-07-02 Jianwei Yang , Chunyuan Li , Pengchuan Zhang , Xiyang Dai , Bin Xiao , Lu Yuan , Jianfeng Gao

KVT: k-NN Attention for Boosting Vision Transformers

Convolutional Neural Networks (CNNs) have dominated computer vision for years, due to its ability in capturing locality and translation invariance. Recently, many vision transformer architectures have been proposed and they show promising…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Pichao Wang , Xue Wang , Fan Wang , Ming Lin , Shuning Chang , Hao Li , Rong Jin

Object Detection with Deep Learning: A Review

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable…

Computer Vision and Pattern Recognition · Computer Science 2019-04-17 Zhong-Qiu Zhao , Peng Zheng , Shou-tao Xu , Xindong Wu

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

It is a challenging task to learn discriminative representation from images and videos, due to large local redundancy and complex global dependency in these visual data. Convolution neural networks (CNNs) and vision transformers (ViTs) have…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Kunchang Li , Yali Wang , Junhao Zhang , Peng Gao , Guanglu Song , Yu Liu , Hongsheng Li , Yu Qiao

DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation

Transformer and its variants have shown great potential for various vision tasks in recent years, including image classification, object detection and segmentation. Meanwhile, recent studies also reveal that with proper architecture design,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-28 Xinghao Chen , Siwei Li , Yijing Yang , Yunhe Wang

Research Progress of Convolutional Neural Network and its Application in Object Detection

With the improvement of computer performance and the increase of data volume, the object detection based on convolutional neural network (CNN) has become the main algorithm for object detection. This paper summarizes the research progress…

Computer Vision and Pattern Recognition · Computer Science 2020-07-28 Wei Zhang , Zuoxiang Zeng

Are Convolutional Neural Networks or Transformers more like human vision?

Modern machine learning models for computer vision exceed humans in accuracy on specific visual recognition tasks, notably on datasets like ImageNet. However, high accuracy can be achieved in many ways. The particular decision function…

Computer Vision and Pattern Recognition · Computer Science 2021-07-02 Shikhar Tuli , Ishita Dasgupta , Erin Grant , Thomas L. Griffiths

Hands-on Evaluation of Visual Transformers for Object Recognition and Detection

Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision Transformers (ViTs), inspired by models originally…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Dimitrios N. Vlachogiannis , Dimitrios A. Koutsomitropoulos

AttentionNet: Aggregating Weak Directions for Accurate Object Detection

We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet…

Computer Vision and Pattern Recognition · Computer Science 2015-09-29 Donggeun Yoo , Sunggyun Park , Joon-Young Lee , Anthony S. Paek , In So Kweon