Related papers: Explicit Visual Prompts for Visual Object Tracking

EPIPTrack: Rethinking Prompt Modeling with Explicit and Implicit Prompts for Multi-Object Tracking

Multimodal semantic cues, such as textual descriptions, have shown strong potential in enhancing target perception for tracking. However, existing methods rely on static textual descriptions from large language models, which lack…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Yukuan Zhang , Jiarui Zhao , Shangqing Nie , Jin Kuang , Shengsheng Wang

An Efficient Token Compression Framework for Visual Object Tracking

Refining visual representations by eliminating their internal feature-level redundancy is crucial for simultaneously optimizing the performance and computational cost of models in visual tracking. To enhance their performance, many…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Weijing Wu , Qihua Liang , Bineng Zhong , Haiying Xia , Zhiyi Mo , Shuxiang Song

ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking

Vision-language tracking aims to locate the target object in the video sequence using a template patch and a language description provided in the initial frame. To achieve robust tracking, especially in complex long-term scenarios that…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 X. Feng , S. Hu , X. Li , D. Zhang , M. Wu , J. Zhang , X. Chen , K. Huang

ProContEXT: Exploring Progressive Context Transformer for Tracking

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Jin-Peng Lan , Zhi-Qi Cheng , Jun-Yan He , Chenyang Li , Bin Luo , Xu Bao , Wangmeng Xiang , Yifeng Geng , Xuansong Xie

Explicit Visual Prompting for Universal Foreground Segmentations

Foreground segmentation is a fundamental problem in computer vision, which includes salient object detection, forgery detection, defocus blur detection, shadow detection, and camouflage object detection. Previous works have typically relied…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Weihuang Liu , Xi Shen , Chi-Man Pun , Xiaodong Cun

Explicit Visual Prompting for Low-Level Structure Segmentations

We consider the generic problem of detecting low-level structures in images, which includes segmenting the manipulated parts, identifying out-of-focus pixels, separating shadow regions, and detecting concealed objects. Whereas each such…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Weihuang Liu , Xi Shen , Chi-Man Pun , Xiaodong Cun

Towards Real-World Visual Tracking with Temporal Contexts

Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Ziang Cao , Ziyuan Huang , Liang Pan , Shiwei Zhang , Ziwei Liu , Changhong Fu

LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning

Visual Prompt Tuning (VPT) techniques have gained prominence for their capacity to adapt pre-trained Vision Transformers (ViTs) to downstream visual tasks using specialized learnable tokens termed as prompts. Contemporary VPT methodologies,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-28 Shentong Mo , Yansen Wang , Xufang Luo , Dongsheng Li

Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-09-13 Chunqing Ruan , Hongjian Wang

VPTracker: Global Vision-Language Tracking via Visual Prompt

Vision-Language Tracking aims to continuously localize objects described by a visual template and a language description. Existing methods, however, are typically limited to local search, making them prone to failures under viewpoint…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Jingchao Wang , Kaiwen Zhou , Zhijian Wu , Kunhua Ji , Dingjiang Huang , Yefeng Zheng

ATSTrack: Enhancing Visual-Language Tracking by Aligning Temporal and Spatial Scales

A main challenge of Visual-Language Tracking (VLT) is the misalignment between visual inputs and language descriptions caused by target movement. Previous trackers have explored many effective feature modification methods to preserve more…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Yihao Zhen , Qiang Wang , Yu Qiao , Liangqiong Qu , Huijie Fan

Dynamic Updates for Language Adaptation in Visual-Language Tracking

The consistency between the semantic information provided by the multi-modal reference and the tracked object is crucial for visual-language (VL) tracking. However, existing VL tracking frameworks rely on static multi-modal references to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Xiaohai Li , Bineng Zhong , Qihua Liang , Zhiyi Mo , Jian Nong , Shuxiang Song

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Zhihong Fu , Qingjie Liu , Zehua Fu , Yunhong Wang

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between…

Computer Vision and Pattern Recognition · Computer Science 2024-01-04 Yaozong Zheng , Bineng Zhong , Qihua Liang , Zhiyi Mo , Shengping Zhang , Xianxian Li

Revisiting the Power of Prompt for Visual Tuning

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Yuzhu Wang , Lechao Cheng , Chaowei Fang , Dingwen Zhang , Manni Duan , Meng Wang

OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation

Vision Transformers (ViTs) have demonstrated remarkable capabilities in learning representations, but their performance is compromised when applied to unseen domains. Previous methods either engage in prompt learning during the training…

Computer Vision and Pattern Recognition · Computer Science 2024-09-11 Yunbei Zhang , Akshay Mehra , Jihun Hamm

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

As the size of transformer-based models continues to grow, fine-tuning these large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Cheng Han , Qifan Wang , Yiming Cui , Zhiwen Cao , Wenguan Wang , Siyuan Qi , Dongfang Liu

Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking

Unsupervised visual object tracking is a challenging task that requires following arbitrary targets in videos without training on ground-truth annotations. Despite considerable progress, existing state-of-the-art unsupervised trackers often…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Zhengbo Zhang , Zhigang Tu , Junsong Yuan , De Wen Soh , Bo Du

ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Yushan Han , Kaer Huang

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers

Visual Prompt Tuning (VPT) has become a promising solution for Parameter-Efficient Fine-Tuning (PEFT) approach for Vision Transformer (ViT) models by partially fine-tuning learnable tokens while keeping most model parameters frozen. Recent…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Li Ren , Chen Chen , Liqiang Wang , Kien Hua