Related papers: DecoderTracker: Decoder-Only Method for Multiple-O…

MOTR: End-to-End Multiple-Object Tracking with Transformer

Temporal modeling of objects is a key challenge in multiple object tracking (MOT). Existing methods track by associating detections through motion-based and appearance-based similarity heuristics. The post-processing nature of association…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Fangao Zeng , Bin Dong , Yuang Zhang , Tiancai Wang , Xiangyu Zhang , Yichen Wei

SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors

End-to-end transformer architectures have driven significant progress in multi-object tracking by unifying detection and association into a single, heuristic-free framework. Despite these benefits, poor detection performance and the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Fabian Gülhan , Emil Mededovic , Yuli Wu , Johannes Stegmaier

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Byungseok Roh , JaeWoong Shin , Wuhyun Shin , Saehoon Kim

Efficient Training for Visual Tracking with Deformable Transformer

Recent Transformer-based visual tracking models have showcased superior performance. Nevertheless, prior works have been resource-intensive, requiring prolonged GPU training hours and incurring high GFLOPs during inference due to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-07 Qingmao Wei , Guotian Zeng , Bi Zeng

Efficient DETR: Improving End-to-End Object Detector with Dense Prior

The recently proposed end-to-end transformer detectors, such as DETR and Deformable DETR, have a cascade structure of stacking 6 decoder layers to update object queries iteratively, without which their performance degrades seriously. In…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Zhuyu Yao , Jiangbo Ai , Boxun Li , Chi Zhang

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression…

Computer Vision and Pattern Recognition · Computer Science 2020-05-29 Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , Sergey Zagoruyko

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder. However, the excessively increased tokens in…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Feng Li , Ailing Zeng , Shilong Liu , Hao Zhang , Hongyang Li , Lei Zhang , Lionel M. Ni

MDS-DETR: DETR with Masked Duplicate Suppressor

The DEtection TRansformer (DETR) is a powerful end-to-end object detector, yet its one-to-one matching strategy suffers from slow convergence and low recall. A common approach to address this issue is to use one-to-many label assignment to…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Chanho Lee , Seunghee Koh , Yunho Jeon , Junmo Kim

Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding

Multimodal transformer exhibits high capacity and flexibility to align image and text for visual grounding. However, the existing encoder-only grounding framework (e.g., TransVG) suffers from heavy computation due to the self-attention…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Fengyuan Shi , Ruopeng Gao , Weilin Huang , Limin Wang

FastTrackTr:Towards Fast Multi-Object Tracking with Transformers

Transformer-based multi-object tracking (MOT) methods have captured the attention of many researchers in recent years. However, these models often suffer from slow inference speeds due to their structure or other issues. To address this…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Pan Liao , Feng Yang , Di Wu , Jinwen Yu , Wenhui Zhao , Dingwen Zhang

DepTR-MOT: Unveiling the Potential of Depth-Informed Trajectory Refinement for Multi-Object Tracking

Visual Multi-Object Tracking (MOT) is a crucial component of robotic perception, yet existing Tracking-By-Detection (TBD) methods often rely on 2D cues, such as bounding boxes and motion modeling, which struggle under occlusions and…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Buyin Deng , Lingxin Huang , Kai Luo , Fei Teng , Kailun Yang

Deformable DETR: Deformable Transformers for End-to-End Object Detection

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-19 Xizhou Zhu , Weijie Su , Lewei Lu , Bin Li , Xiaogang Wang , Jifeng Dai

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Ruopeng Gao , Limin Wang

D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention

DETR is the first fully end-to-end detector that predicts a final set of predictions without post-processing. However, it suffers from problems such as low performance and slow convergence. A series of works aim to tackle these issues in…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Junyu Lin , Xiaofeng Mao , Yuefeng Chen , Lei Xu , Yuan He , Hui Xue

Learning Spatio-Temporal Transformer for Visual Tracking

In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Bin Yan , Houwen Peng , Jianlong Fu , Dong Wang , Huchuan Lu

Motion-Aware Transformer for Multi-Object Tracking

Multi-object tracking (MOT) in videos remains challenging due to complex object motions and crowded scenes. Recent DETR-based frameworks offer end-to-end solutions but typically process detection and tracking queries jointly within a single…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Xu Yang , Gady Agam

TrackFormer: Multi-Object Tracking with Transformers

The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce…

Computer Vision and Pattern Recognition · Computer Science 2022-05-02 Tim Meinhardt , Alexander Kirillov , Laura Leal-Taixe , Christoph Feichtenhofer

OneTrack-M: A multitask approach to transformer-based MOT models

Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Luiz C. S. de Araujo , Carlos M. S. Figueiredo

Dense Object Detection Based on De-homogenized Queries

Dense object detection is widely used in automatic driving, video surveillance, and other fields. This paper focuses on the challenging task of dense object detection. Currently, detection methods based on greedy algorithms, such as…

Computer Vision and Pattern Recognition · Computer Science 2025-02-12 Yueming Huang , Chenrui Ma , Hao Zhou , Hao Wu , Guowu Yuan

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

This study proposes an improved end-to-end multi-target tracking algorithm that adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer's encoder-decoder structure. A multi-dimensional feature…

Computer Vision and Pattern Recognition · Computer Science 2022-11-14 Yong Hong , Deren Li , Shupei Luo , Xin Chen , Yi Yang , Mi Wang