Related papers: Improving Visual Object Tracking through Visual Pr…

Visual Prompt Multi-Modal Tracking

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawen Zhu , Simiao Lai , Xin Chen , Dong Wang , Huchuan Lu

Explicitly Modeling the Discriminability for Instance-Aware Visual Object Tracking

Visual object tracking performance has been dramatically improved in recent years, but some severe challenges remain open, like distractors and occlusions. We suspect the reason is that the feature representations of the tracking targets…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Mengmeng Wang , Xiaoqian Yang , Yong Liu

SAMITE: Position Prompted SAM2 with Calibrated Memory for Visual Object Tracking

Visual Object Tracking (VOT) is widely used in applications like autonomous driving to continuously track targets in videos. Existing methods can be roughly categorized into template matching and autoregressive methods, where the former…

Computer Vision and Pattern Recognition · Computer Science 2025-07-30 Qianxiong Xu , Lanyun Zhu , Chenxi Liu , Guosheng Lin , Cheng Long , Ziyue Li , Rui Zhao

IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking

Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Run Luo , Zikai Song , Longze Chen , Yunshui Li , Min Yang , Wei Yang

TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT

While Multi-Object Tracking (MOT) has made substantial advancements, it is limited by heavy reliance on prior knowledge and limited to predefined categories. In contrast, Generic Multiple Object Tracking (GMOT), tracking multiple objects…

Computer Vision and Pattern Recognition · Computer Science 2024-09-05 Duy Le Dinh Anh , Kim Hoang Tran , Ngan Hoang Le

Exploration of visual prompt in Grounded pre-trained open-set detection

Text prompts are crucial for generalizing pre-trained open-set object detection models to new categories. However, current methods for text prompts are limited as they require manual feedback when generalizing to new categories, which…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Qibo Chen , Weizhong Jin , Shuchang Li , Mengdi Liu , Li Yu , Jian Jiang , Xiaozheng Wang

VPTracker: Global Vision-Language Tracking via Visual Prompt

Vision-Language Tracking aims to continuously localize objects described by a visual template and a language description. Existing methods, however, are typically limited to local search, making them prone to failures under viewpoint…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Jingchao Wang , Kaiwen Zhou , Zhijian Wu , Kunhua Ji , Dingjiang Huang , Yefeng Zheng

Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking

Unsupervised visual object tracking is a challenging task that requires following arbitrary targets in videos without training on ground-truth annotations. Despite considerable progress, existing state-of-the-art unsupervised trackers often…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Zhengbo Zhang , Zhigang Tu , Junsong Yuan , De Wen Soh , Bo Du

Explicit Visual Prompts for Visual Object Tracking

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Liangtao Shi , Bineng Zhong , Qihua Liang , Ning Li , Shengping Zhang , Xianxian Li

MAVOT: Memory-Augmented Video Object Tracking

We introduce a one-shot learning approach for video object tracking. The proposed algorithm requires seeing the object to be tracked only once, and employs an external memory to store and remember the evolving features of the foreground…

Computer Vision and Pattern Recognition · Computer Science 2017-11-28 Boyu Liu , Yanzhao Wang , Yu-Wing Tai , Chi-Keung Tang

Multi-Granularity Language-Guided Training for Multi-Object Tracking

Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Yuhao Li , Jiale Cao , Muzammal Naseer , Yu Zhu , Jinqiu Sun , Yanning Zhang , Fahad Shahbaz Khan

Incremental Object Detection with Prompt-based Methods

Visual prompt-based methods have seen growing interest in incremental learning (IL) for image classification. These approaches learn additional embedding vectors while keeping the model frozen, making them efficient to train. However, no…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Matthias Neuwirth-Trapp , Maarten Bieshaar , Danda Pani Paudel , Luc Van Gool

Beyond SOT: Tracking Multiple Generic Objects at Once

Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Christoph Mayer , Martin Danelljan , Ming-Hsuan Yang , Vittorio Ferrari , Luc Van Gool , Alina Kuznetsova

DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts

Visual prompted object detection enables interactive and flexible definition of target categories, thereby facilitating open-vocabulary detection. Since visual prompts are derived directly from image features, they often outperform text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Bo Qian , Dahu Shi , Xing Wei

Improving tracking with a tracklet associator

Multiple object tracking (MOT) is a task in computer vision that aims to detect the position of various objects in videos and to associate them to a unique identity. We propose an approach based on Constraint Programming (CP) whose goal is…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Rémi Nahon , Guillaume-Alexandre Bilodeau , Gilles Pesant

Z-GMOT: Zero-shot Generic Multiple Object Tracking

Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Kim Hoang Tran , Anh Duy Le Dinh , Tien Phat Nguyen , Thinh Phan , Pha Nguyen , Khoa Luu , Donald Adjeroh , Gianfranco Doretto , Ngan Hoang Le

PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training

Open-Set Object Detection (OSOD) enables recognition of novel categories beyond fixed classes but faces challenges in aligning text representations with complex visual concepts and the scarcity of image-text pairs for rare categories. This…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Weifu Fu , Jinyang Li , Bin-Bin Gao , Jialin Li , Yuhuan Lin , Hanqiu Deng , Wenbing Tao , Yong Liu , Chengjie Wang

EPIPTrack: Rethinking Prompt Modeling with Explicit and Implicit Prompts for Multi-Object Tracking

Multimodal semantic cues, such as textual descriptions, have shown strong potential in enhancing target perception for tracking. However, existing methods rely on static textual descriptions from large language models, which lack…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Yukuan Zhang , Jiarui Zhao , Shangqing Nie , Jin Kuang , Shengsheng Wang

Progressive Multi-modal Conditional Prompt Tuning

Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-04-25 Xiaoyu Qiu , Hao Feng , Yuechen Wang , Wengang Zhou , Houqiang Li

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains. Existing works primarily focus on improving the generalization ability of static networks.…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Deng Li , Aming Wu , Yaowei Wang , Yahong Han