Related papers: MODETR: Moving Object Detection with Transformers

ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer

We propose ST-DETR, a Spatio-Temporal Transformer-based architecture for object detection from a sequence of temporal frames. We treat the temporal frames as sequences in both space and time and employ the full attention mechanisms to take…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Eslam Mohamed , Ahmad El-Sallab

Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Moving objects have special importance for Autonomous Driving tasks. Detecting moving objects can be posed as Moving Object Segmentation, by segmenting the object pixels, or Moving Object Detection, by generating a bounding box for the…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Eslam Mohamed , Ahmed El-Sallab

End-to-End Video Object Detection with Spatial-Temporal Transformers

Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Lu He , Qianyu Zhou , Xiangtai Li , Li Niu , Guangliang Cheng , Xiao Li , Wenxuan Liu , Yunhai Tong , Lizhuang Ma , Liqing Zhang

MOTR: End-to-End Multiple-Object Tracking with Transformer

Temporal modeling of objects is a key challenge in multiple object tracking (MOT). Existing methods track by associating detections through motion-based and appearance-based similarity heuristics. The post-processing nature of association…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Fangao Zeng , Bin Dong , Yuang Zhang , Tiancai Wang , Xiangyu Zhang , Yichen Wei

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Peng Dai , Yiqiang Feng , Renliang Weng , Changshui Zhang

RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving

Moving Object Detection (MOD) is a critical task for autonomous vehicles as moving objects represent higher collision risk than static ones. The trajectory of the ego-vehicle is planned based on the future states of detected moving objects.…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Mohamed Ramzy , Hazem Rashed , Ahmad El Sallab , Senthil Yogamani

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their…

Computer Vision and Pattern Recognition · Computer Science 2022-11-23 Qianyu Zhou , Xiangtai Li , Lu He , Yibo Yang , Guangliang Cheng , Yunhai Tong , Lizhuang Ma , Dacheng Tao

MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving

We propose a novel multi-task learning system that combines appearance and motion cues for a better semantic reasoning of the environment. A unified architecture for joint vehicle detection and motion segmentation is introduced. In this…

Computer Vision and Pattern Recognition · Computer Science 2018-10-19 Mennatullah Siam , Heba Mahgoub , Mohamed Zahran , Senthil Yogamani , Martin Jagersand , Ahmad El-Sallab

Contrastive Learning for Multi-Object Tracking with Transformers

The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to…

Computer Vision and Pattern Recognition · Computer Science 2025-05-16 Pierre-François De Plaen , Nicola Marinello , Marc Proesmans , Tinne Tuytelaars , Luc Van Gool

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression…

Computer Vision and Pattern Recognition · Computer Science 2020-05-29 Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , Sergey Zagoruyko

Oriented Object Detection with Transformer

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Teli Ma , Mingyuan Mao , Honghui Zheng , Peng Gao , Xiaodi Wang , Shumin Han , Errui Ding , Baochang Zhang , David Doermann

Transformer Network for Multi-Person Tracking and Re-Identification in Unconstrained Environment

Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics. Despite considerable advancements, existing MOT methodologies tend to falter…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Hamza Mukhtar , Muhammad Usman Ghani Khan

VM-MODNet: Vehicle Motion aware Moving Object Detection for Autonomous Driving

Moving object Detection (MOD) is a critical task in autonomous driving as moving agents around the ego-vehicle need to be accurately detected for safe trajectory planning. It also enables appearance agnostic detection of objects based on…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Hazem Rashed , Ahmad El Sallab , Senthil Yogamani

Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection

The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In…

Computer Vision and Pattern Recognition · Computer Science 2025-08-21 Yifan Zhang , Zhiyu Zhu , Junhui Hou , Dapeng Wu

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yiming Cui , Linjie Yang , Haichao Yu

Rethinking Transformer-based Set Prediction for Object Detection

DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge. In this paper, we investigate the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-13 Zhiqing Sun , Shengcao Cao , Yiming Yang , Kris Kitani

Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos

During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-end multi-modal understanding model that performs tasks such as phase grounding, referring expression…

Computer Vision and Pattern Recognition · Computer Science 2022-09-22 Tomás Crisol , Joel Ermantraut , Adrián Rostagno , Santiago L. Aggio , Javier Iparraguirre

Object Detection with Transformers: A Review

The astounding performance of transformers in natural language processing (NLP) has motivated researchers to explore their applications in computer vision tasks. DEtection TRansformer (DETR) introduces transformers to object detection tasks…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed…

Computer Vision and Pattern Recognition · Computer Science 2021-10-13 Aishwarya Kamath , Mannat Singh , Yann LeCun , Gabriel Synnaeve , Ishan Misra , Nicolas Carion

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection

DETR-based methods, which use multi-layer transformer decoders to refine object queries iteratively, have shown promising performance in 3D indoor object detection. However, the scene point features in the transformer decoder remain fixed,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Chuxin Wang , Wenfei Yang , Xiang Liu , Tianzhu Zhang