Related papers: Efficient Decoder-free Object Detection with Trans…

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression…

Computer Vision and Pattern Recognition · Computer Science 2020-05-29 Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , Sergey Zagoruyko

Conditional DETR V2: Efficient Detection Transformer with Box Queries

In this paper, we are interested in Detection Transformer (DETR), an end-to-end object detection approach based on a transformer encoder-decoder architecture without hand-crafted postprocessing, such as NMS. Inspired by Conditional DETR, an…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Xiaokang Chen , Fangyun Wei , Gang Zeng , Jingdong Wang

Object Detection with Transformers: A Review

The astounding performance of transformers in natural language processing (NLP) has motivated researchers to explore their applications in computer vision tasks. DEtection TRansformer (DETR) introduces transformers to object detection tasks…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

An Extendable, Efficient and Effective Transformer-based Object Detector

Transformers have been widely used in numerous vision problems especially for visual recognition and detection. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder. However, the excessively increased tokens in…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Feng Li , Ailing Zeng , Shilong Liu , Hao Zhang , Hongyang Li , Lei Zhang , Lionel M. Ni

Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design

Real-time object detection is crucial for real-world applications as it requires high accuracy with low latency. While Detection Transformers (DETR) have demonstrated significant performance improvements, current real-time DETR models are…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Jiannan Huang , Aditya Kane , Fengzhe Zhou , Yunchao Wei , Humphrey Shi

Exploring Plain Vision Transformer Backbones for Object Detection

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical…

Computer Vision and Pattern Recognition · Computer Science 2022-06-13 Yanghao Li , Hanzi Mao , Ross Girshick , Kaiming He

Toward Transformer-Based Object Detection

Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first…

Computer Vision and Pattern Recognition · Computer Science 2020-12-21 Josh Beal , Eric Kim , Eric Tzeng , Dong Huk Park , Andrew Zhai , Dmitry Kislyuk

Deformable DETR: Deformable Transformers for End-to-End Object Detection

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-19 Xizhou Zhu , Weijie Su , Lewei Lu , Bin Li , Xiaogang Wang , Jifeng Dai

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Byungseok Roh , JaeWoong Shin , Wuhyun Shin , Saehoon Kim

RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models

Real-time object detection has achieved substantial progress through meticulously designed architectures and optimization strategies. However, the pursuit of high-speed inference via lightweight network designs often leads to degraded…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Zijun Liao , Yian Zhao , Xin Shan , Yu Yan , Chang Liu , Lei Lu , Xiangyang Ji , Jie Chen

D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention

DETR is the first fully end-to-end detector that predicts a final set of predictions without post-processing. However, it suffers from problems such as low performance and slow convergence. A series of works aim to tackle these issues in…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Junyu Lin , Xiaofeng Mao , Yuefeng Chen , Lei Xu , Yuan He , Hui Xue

Anchor DETR: Query Design for Transformer-Based Object Detection

In this paper, we propose a novel query design for the transformer-based object detection. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an…

Computer Vision and Pattern Recognition · Computer Science 2022-01-05 Yingming Wang , Xiangyu Zhang , Tong Yang , Jian Sun

Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

The main challenge for small object detection algorithms is to ensure accuracy while pursuing real-time performance. The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. In…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Ji Huang , Hui Wang

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Qiang Chen , Xiangbo Su , Xinyu Zhang , Jian Wang , Jiahui Chen , Yunpeng Shen , Chuchu Han , Ziliang Chen , Weixiang Xu , Fanrong Li , Shan Zhang , Kun Yao , Errui Ding , Gang Zhang , Jingdong Wang

Oriented Object Detection with Transformer

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Teli Ma , Mingyuan Mao , Honghui Zheng , Peng Gao , Xiaodi Wang , Shumin Han , Errui Ding , Baochang Zhang , David Doermann

ComplETR: Reducing the cost of annotations for object detection in dense scenes with vision transformers

Annotating bounding boxes for object detection is expensive, time-consuming, and error-prone. In this work, we propose a DETR based framework called ComplETR that is designed to explicitly complete missing annotations in partially annotated…

Computer Vision and Pattern Recognition · Computer Science 2022-09-14 Achin Jain , Kibok Lee , Gurumurthy Swaminathan , Hao Yang , Bernt Schiele , Avinash Ravichandran , Onkar Dabeer

LoFTR: Detector-Free Local Feature Matching with Transformers

We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Jiaming Sun , Zehong Shen , Yuang Wang , Hujun Bao , Xiaowei Zhou

DETR++: Taming Your Multi-Scale Detection Transformer

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12]. With the sweeping reform of Transformers [27] in natural language processing, Carion et al. [2]…

Computer Vision and Pattern Recognition · Computer Science 2022-06-08 Chi Zhang , Lijuan Liu , Xiaoxue Zang , Frederick Liu , Hao Zhang , Xinying Song , Jindong Chen