English
Related papers

Related papers: Explicit Visual Prompts for Visual Object Tracking

200 papers

Multimodal semantic cues, such as textual descriptions, have shown strong potential in enhancing target perception for tracking. However, existing methods rely on static textual descriptions from large language models, which lack…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Yukuan Zhang , Jiarui Zhao , Shangqing Nie , Jin Kuang , Shengsheng Wang

Refining visual representations by eliminating their internal feature-level redundancy is crucial for simultaneously optimizing the performance and computational cost of models in visual tracking. To enhance their performance, many…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Weijing Wu , Qihua Liang , Bineng Zhong , Haiying Xia , Zhiyi Mo , Shuxiang Song

Vision-language tracking aims to locate the target object in the video sequence using a template patch and a language description provided in the initial frame. To achieve robust tracking, especially in complex long-term scenarios that…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 X. Feng , S. Hu , X. Li , D. Zhang , M. Wu , J. Zhang , X. Chen , K. Huang

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Jin-Peng Lan , Zhi-Qi Cheng , Jun-Yan He , Chenyang Li , Bin Luo , Xu Bao , Wangmeng Xiang , Yifeng Geng , Xuansong Xie

Foreground segmentation is a fundamental problem in computer vision, which includes salient object detection, forgery detection, defocus blur detection, shadow detection, and camouflage object detection. Previous works have typically relied…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Weihuang Liu , Xi Shen , Chi-Man Pun , Xiaodong Cun

We consider the generic problem of detecting low-level structures in images, which includes segmenting the manipulated parts, identifying out-of-focus pixels, separating shadow regions, and detecting concealed objects. Whereas each such…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Weihuang Liu , Xi Shen , Chi-Man Pun , Xiaodong Cun

Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Ziang Cao , Ziyuan Huang , Liang Pan , Shiwei Zhang , Ziwei Liu , Changhong Fu

Visual Prompt Tuning (VPT) techniques have gained prominence for their capacity to adapt pre-trained Vision Transformers (ViTs) to downstream visual tasks using specialized learnable tokens termed as prompts. Contemporary VPT methodologies,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-28 Shentong Mo , Yansen Wang , Xufang Luo , Dongsheng Li

Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-09-13 Chunqing Ruan , Hongjian Wang

Vision-Language Tracking aims to continuously localize objects described by a visual template and a language description. Existing methods, however, are typically limited to local search, making them prone to failures under viewpoint…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Jingchao Wang , Kaiwen Zhou , Zhijian Wu , Kunhua Ji , Dingjiang Huang , Yefeng Zheng

A main challenge of Visual-Language Tracking (VLT) is the misalignment between visual inputs and language descriptions caused by target movement. Previous trackers have explored many effective feature modification methods to preserve more…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Yihao Zhen , Qiang Wang , Yu Qiao , Liangqiong Qu , Huijie Fan

The consistency between the semantic information provided by the multi-modal reference and the tracked object is crucial for visual-language (VL) tracking. However, existing VL tracking frameworks rely on static multi-modal references to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Xiaohai Li , Bineng Zhong , Qihua Liang , Zhiyi Mo , Jian Nong , Shuxiang Song

Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Zhihong Fu , Qingjie Liu , Zehua Fu , Yunhong Wang

Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between…

Computer Vision and Pattern Recognition · Computer Science 2024-01-04 Yaozong Zheng , Bineng Zhong , Qihua Liang , Zhiyi Mo , Shengping Zhang , Xianxian Li

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Yuzhu Wang , Lechao Cheng , Chaowei Fang , Dingwen Zhang , Manni Duan , Meng Wang

Vision Transformers (ViTs) have demonstrated remarkable capabilities in learning representations, but their performance is compromised when applied to unseen domains. Previous methods either engage in prompt learning during the training…

Computer Vision and Pattern Recognition · Computer Science 2024-09-11 Yunbei Zhang , Akshay Mehra , Jihun Hamm

As the size of transformer-based models continues to grow, fine-tuning these large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Cheng Han , Qifan Wang , Yiming Cui , Zhiwen Cao , Wenguan Wang , Siyuan Qi , Dongfang Liu

Unsupervised visual object tracking is a challenging task that requires following arbitrary targets in videos without training on ground-truth annotations. Despite considerable progress, existing state-of-the-art unsupervised trackers often…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Zhengbo Zhang , Zhigang Tu , Junsong Yuan , De Wen Soh , Bo Du

Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Yushan Han , Kaer Huang

Visual Prompt Tuning (VPT) has become a promising solution for Parameter-Efficient Fine-Tuning (PEFT) approach for Vision Transformer (ViT) models by partially fine-tuning learnable tokens while keeping most model parameters frozen. Recent…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Li Ren , Chen Chen , Liqiang Wang , Kien Hua
‹ Prev 1 2 3 10 Next ›