Related papers: ProContEXT: Exploring Progressive Context Transfor…

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

In video object tracking, there exist rich temporal contexts among successive frames, which have been largely overlooked in existing trackers. In this work, we bridge the individual video frames and explore the temporal contexts across them…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Ning Wang , Wengang Zhou , Jie Wang , Houqaing Li

ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking

Vision-language tracking aims to locate the target object in the video sequence using a template patch and a language description provided in the initial frame. To achieve robust tracking, especially in complex long-term scenarios that…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 X. Feng , S. Hu , X. Li , D. Zhang , M. Wu , J. Zhang , X. Chen , K. Huang

Explicit Visual Prompts for Visual Object Tracking

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Liangtao Shi , Bineng Zhong , Qihua Liang , Ning Li , Shengping Zhang , Xianxian Li

ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Yushan Han , Kaer Huang

Towards Real-World Visual Tracking with Temporal Contexts

Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Ziang Cao , Ziyuan Huang , Liang Pan , Shiwei Zhang , Ziwei Liu , Changhong Fu

SparseTT: Visual Tracking with Sparse Transformers

Transformers have been successfully applied to the visual tracking task and significantly promote tracking performance. The self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers.…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Zhihong Fu , Zehua Fu , Qingjie Liu , Wenrui Cai , Yunhong Wang

Context-aware Visual Tracking with Joint Meta-updating

Visual object tracking acts as a pivotal component in various emerging video applications. Despite the numerous developments in visual tracking, existing deep trackers are still likely to fail when tracking against objects with dramatic…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Qiuhong Shen , Xin Li , Fanyang Meng , Yongsheng Liang

TrackFormer: Multi-Object Tracking with Transformers

The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce…

Computer Vision and Pattern Recognition · Computer Science 2022-05-02 Tim Meinhardt , Alexander Kirillov , Laura Leal-Taixe , Christoph Feichtenhofer

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

3D single object tracking plays an essential role in many applications, such as autonomous driving. It remains a challenging problem due to the large appearance variation and the sparsity of points caused by occlusion and limited sensor…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Tian-Xing Xu , Yuan-Chen Guo , Yu-Kun Lai , Song-Hai Zhang

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. In this paper, we propose a novel solution named TransSTAM, which leverages Transformer to effectively model…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Peng Dai , Yiqiang Feng , Renliang Weng , Changshui Zhang

Prompted Contextual Transformer for Incomplete-View CT Reconstruction

Incomplete-view computed tomography (CT) can shorten the data acquisition time and allow scanning of large objects, including sparse-view and limited-angle scenarios, each with various settings, such as different view numbers or angular…

Image and Video Processing · Electrical Eng. & Systems 2024-03-13 Chenglong Ma , Zilong Li , Junjun He , Junping Zhang , Yi Zhang , Hongming Shan

IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking

Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Run Luo , Zikai Song , Longze Chen , Yunshui Li , Min Yang , Wei Yang

Less is More: Token Context-aware Learning for Object Tracking

Recently, several studies have shown that utilizing contextual information to perceive target states is crucial for object tracking. They typically capture context by incorporating multiple video frames. However, these naive frame-context…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Chenlong Xu , Bineng Zhong , Qihua Liang , Yaozong Zheng , Guorong Li , Shuxiang Song

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose a solution named TransMOT, which leverages powerful graph transformers to efficiently model the spatial and…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Peng Chu , Jiang Wang , Quanzeng You , Haibin Ling , Zicheng Liu

Contextual Transformer Networks for Visual Recognition

Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous computer vision tasks.…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Yehao Li , Ting Yao , Yingwei Pan , Tao Mei

PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

Recent years have witnessed a trend of applying context frames to boost the performance of object detection as video object detection. Existing methods usually aggregate features at one stroke to enhance the feature. These methods, however,…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Han Wang , Jun Tang , Xiaodong Liu , Shanyan Guan , Rong Xie , Li Song

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Tracking by natural language specification (TNL) aims to consistently localize a target in a video sequence given a linguistic description in the initial frame. Existing methodologies perform language-based and template-based matching for…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Yanyan Shao , Shuting He , Qi Ye , Yuchao Feng , Wenhan Luo , Jiming Chen

eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking

The unique complementarity of frame-based and event cameras for high frame rate object tracking has recently inspired some research attempts to develop multi-modal fusion approaches. However, these methods directly fuse both modalities and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Yucheng Chen , Lin Wang

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Visual Object Tracking (VOT) aims to estimate the positions of target objects in a video sequence, which is an important vision task with various real-world applications. Depending on whether the initial states of target objects are…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Junke Wang , Zuxuan Wu , Dongdong Chen , Chong Luo , Xiyang Dai , Lu Yuan , Yu-Gang Jiang

MambaLCT: Boosting Tracking via Long-term Context State Space Model

Effectively constructing context information with long-term dependencies from video sequences is crucial for object tracking. However, the context length constructed by existing work is limited, only considering object information from…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Xiaohai Li , Bineng Zhong , Qihua Liang , Guorong Li , Zhiyi Mo , Shuxiang Song