Related papers: Learning Spatio-Temporal Transformer for Visual Tr…

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among successive frames, which have been largely overlooked in existing trackers. In this work, we bridge the individual video frames and explore the temporal contexts across them…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Ning Wang , Wengang Zhou , Jie Wang , Houqaing Li

Local Perception-Aware Transformer for Aerial Tracking

Transformer-based visual object tracking has been utilized extensively. However, the Transformer structure is lack of enough inductive bias. In addition, only focusing on encoding the global feature does harm to modeling local details,…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Changhong Fu , Weiyu Peng , Sihang Li , Junjie Ye , Ziang Cao

Track Targets by Dense Spatio-Temporal Position Encoding

In this work, we propose a novel paradigm to encode the position of targets for target tracking in videos using transformers. The proposed paradigm, Dense Spatio-Temporal (DST) position encoding, encodes spatio-temporal position information…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Jinkun Cao , Hao Wu , Kris Kitani

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression…

Computer Vision and Pattern Recognition · Computer Science 2020-05-29 Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , Sergey Zagoruyko

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Peng Dai , Yiqiang Feng , Renliang Weng , Changshui Zhang

Straight to Shapes: Real-time Detection of Encoded Shapes

Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their…

Computer Vision and Pattern Recognition · Computer Science 2017-07-06 Saumya Jetley , Michael Sapienza , Stuart Golodetz , Philip H. S. Torr

TimePerceiver: An Encoder-Decoder Framework for Generalized Time-Series Forecasting

In machine learning, effective modeling requires a holistic consideration of how to encode inputs, make predictions (i.e., decoding), and train the model. However, in time-series forecasting, prior work has predominantly focused on encoder…

Machine Learning · Computer Science 2025-12-30 Jaebin Lee , Hankook Lee

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving

The strong demand of autonomous driving in the industry has lead to strong interest in 3D object detection and resulted in many excellent 3D object detection algorithms. However, the vast majority of algorithms only model single-frame data,…

Computer Vision and Pattern Recognition · Computer Science 2020-11-30 Zhenxun Yuan , Xiao Song , Lei Bai , Wengang Zhou , Zhe Wang , Wanli Ouyang

TrTr: Visual Tracking with Transformer

Template-based discriminative trackers are currently the dominant tracking methods due to their robustness and accuracy, and the Siamese-network-based methods that depend on cross-correlation operation between features extracted from…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Moju Zhao , Kei Okada , Masayuki Inaba

TrackFormer: Multi-Object Tracking with Transformers

The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce…

Computer Vision and Pattern Recognition · Computer Science 2022-05-02 Tim Meinhardt , Alexander Kirillov , Laura Leal-Taixe , Christoph Feichtenhofer

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

The success of visual tracking has been largely driven by datasets with manual box annotations. However, these box annotations require tremendous human effort, limiting the scale and diversity of existing tracking datasets. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2025-07-30 Yaozong Zheng , Bineng Zhong , Qihua Liang , Ning Li , Shuxiang Song

Future Object Detection with Spatiotemporal Transformers

We propose the task Future Object Detection, in which the goal is to predict the bounding boxes for all visible objects in a future video frame. While this task involves recognizing temporal and kinematic patterns, in addition to the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Adam Tonderski , Joakim Johnander , Christoffer Petersson , Kalle Åström

An Exploration of Target-Conditioned Segmentation Methods for Visual Object Trackers

Visual object tracking is the problem of predicting a target object's state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal…

Computer Vision and Pattern Recognition · Computer Science 2021-02-02 Matteo Dunnhofer , Niki Martinel , Christian Micheloni

Efficient Training for Visual Tracking with Deformable Transformer

Recent Transformer-based visual tracking models have showcased superior performance. Nevertheless, prior works have been resource-intensive, requiring prolonged GPU training hours and incurring high GFLOPs during inference due to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-07 Qingmao Wei , Guotian Zeng , Bi Zeng

ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer

We propose ST-DETR, a Spatio-Temporal Transformer-based architecture for object detection from a sequence of temporal frames. We treat the temporal frames as sequences in both space and time and employ the full attention mechanisms to take…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Eslam Mohamed , Ahmad El-Sallab

Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture

In this paper, we propose a deep learning based vehicle trajectory prediction technique which can generate the future trajectory sequence of surrounding vehicles in real time. We employ the encoder-decoder architecture which analyzes the…

Machine Learning · Computer Science 2018-10-23 Seong Hyeon Park , ByeongDo Kim , Chang Mook Kang , Chung Choo Chung , Jun Won Choi

OneTrack-M: A multitask approach to transformer-based MOT models

Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Luiz C. S. de Araujo , Carlos M. S. Figueiredo

Learning Global Structure Consistency for Robust Object Tracking

Fast appearance variations and the distractions of similar objects are two of the most challenging problems in visual object tracking. Unlike many existing trackers that focus on modeling only the target, in this work, we consider the…

Computer Vision and Pattern Recognition · Computer Science 2020-08-28 Bi Li , Chengquan Zhang , Zhibin Hong , Xu Tang , Jingtuo Liu , Junyu Han , Errui Ding , Wenyu Liu

End-to-End Video Object Detection with Spatial-Temporal Transformers

Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Lu He , Qianyu Zhou , Xiangtai Li , Li Niu , Guangliang Cheng , Xiao Li , Wenxuan Liu , Yunhai Tong , Lizhuang Ma , Liqing Zhang

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

Existing visual object tracking usually learns a bounding-box based template to match the targets across frames, which cannot accurately learn a pixel-wise representation, thereby being limited in handling severe appearance variations. To…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Fei Xie , Wankou Yang , Bo Liu , Kaihua Zhang , Wanli Xue , Wangmeng Zuo