Related papers: An Efficient Token Compression Framework for Visua…

UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking

One-stream Transformer-based trackers achieve advanced performance in visual object tracking but suffer from significant computational overhead that hinders real-time deployment. While token pruning offers a path to efficiency, existing…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Hao Wu , Xudong Wang , Jialiang Zhang , Junlong Tong , Xinghao Chen , Junyan Lin , Yunpu Ma , Xiaoyu Shen

Explicit Visual Prompts for Visual Object Tracking

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Liangtao Shi , Bineng Zhong , Qihua Liang , Ning Li , Shengping Zhang , Xianxian Li

Efficient Visual Tracking with Exemplar Transformers

The design of more complex and powerful neural network models has significantly advanced the state-of-the-art in visual object tracking. These advances can be attributed to deeper networks, or the introduction of new building blocks, such…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Philippe Blatter , Menelaos Kanakis , Martin Danelljan , Luc Van Gool

Less is More: Token Context-aware Learning for Object Tracking

Recently, several studies have shown that utilizing contextual information to perceive target states is crucial for object tracking. They typically capture context by incorporating multiple video frames. However, these naive frame-context…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Chenlong Xu , Bineng Zhong , Qihua Liang , Yaozong Zheng , Guorong Li , Shuxiang Song

Towards Real-World Visual Tracking with Temporal Contexts

Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Ziang Cao , Ziyuan Huang , Liang Pan , Shiwei Zhang , Ziwei Liu , Changhong Fu

Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Xiangyang Yang , Dan Zeng , Xucheng Wang , You Wu , Hengzhou Ye , Qijun Zhao , Shuiwang Li

Context-Aware Token Pruning and Discriminative Selective Attention for Transformer Tracking

One-stream Transformer-based trackers have demonstrated remarkable performance by concatenating template and search region tokens, thereby enabling joint attention across all tokens. However, enabling an excessive proportion of background…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Janani Kugarajeevan , Thanikasalam Kokul , Amirthalingam Ramanan , Subha Fernando

UETrack: A Unified and Efficient Framework for Single Object Tracking

With growing real-world demands, efficient tracking has received increasing attention. However, most existing methods are limited to RGB inputs and struggle in multi-modal scenarios. Moreover, current multi-modal tracking approaches…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Ben Kang , Jie Zhao , Xin Chen , Wanting Geng , Bin Zhang , Lu Zhang , Dong Wang , Huchuan Lu

Enforcing Template Representability and Temporal Consistency for Adaptive Sparse Tracking

Sparse representation has been widely studied in visual tracking, which has shown promising tracking performance. Despite a lot of progress, the visual tracking problem is still a challenging task due to appearance variations over time. In…

Computer Vision and Pattern Recognition · Computer Science 2016-05-03 Xue Yang , Fei Han , Hua Wang , Hao Zhang

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

3D single object tracking (SOT) in LiDAR point clouds is a critical task in computer vision and autonomous driving. Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Sifan Zhou , Yichao Cao , Jiahao Nie , Yuqian Fu , Ziyu Zhao , Xiaobo Lu , Shuo Wang

Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI

Token compression techniques have recently emerged as powerful tools for accelerating Vision Transformer (ViT) inference in computer vision. Due to the quadratic computational complexity with respect to the token sequence length, these…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Phat Nguyen , Ngai-Man Cheung

General Compression Framework for Efficient Transformer Object Tracking

Previous works have attempted to improve tracking efficiency through lightweight architecture design or knowledge distillation from teacher models to compact student trackers. However, these solutions often sacrifice accuracy for speed to a…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Lingyi Hong , Jinglun Li , Xinyu Zhou , Shilin Yan , Pinxue Guo , Kaixun Jiang , Zhaoyu Chen , Shuyong Gao , Runze Li , Xingdong Sheng , Wei Zhang , Hong Lu , Wenqiang Zhang

VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Token-based video representation has emerged as a promising approach for enabling large language models (LLMs) to interpret video content. However, existing token reduction techniques, such as pruning and merging, often disrupt essential…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Haichao Zhang , Yun Fu

TCTrack: Temporal Contexts for Aerial Tracking

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit temporal contexts for aerial tracking. The temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Ziang Cao , Ziyuan Huang , Liang Pan , Shiwei Zhang , Ziwei Liu , Changhong Fu

Robust Object Modeling for Visual Tracking

Object modeling has become a core part of recent tracking frameworks. Current popular tackers use Transformer attention to extract the template feature separately or interactively with the search region. However, separate template learning…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Yidong Cai , Jie Liu , Jie Tang , Gangshan Wu

CFTrack: Enhancing Lightweight Visual Tracking through Contrastive Learning and Feature Matching

Achieving both efficiency and strong discriminative ability in lightweight visual tracking is a challenge, especially on mobile and edge devices with limited computational resources. Conventional lightweight trackers often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-02-28 Juntao Liang , Jun Hou , Weijun Zhang , Yong Wang

Exploring Dynamic Transformer for Efficient Object Tracking

The speed-precision trade-off is a critical problem for visual object tracking which usually requires low latency and deployment on constrained resources. Existing solutions for efficient tracking mainly focus on adopting light-weight…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Jiawen Zhu , Xin Chen , Haiwen Diao , Shuai Li , Jun-Yan He , Chenyang Li , Bin Luo , Dong Wang , Huchuan Lu

ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking

Recently, the transformer has enabled the speed-oriented trackers to approach state-of-the-art (SOTA) performance with high-speed thanks to the smaller input size or the lighter feature extraction backbone, though they still substantially…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Yutong Kou , Jin Gao , Bing Li , Gang Wang , Weiming Hu , Yizheng Wang , Liang Li

Context-aware Deep Feature Compression for High-speed Visual Tracking

We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in…

Computer Vision and Pattern Recognition · Computer Science 2020-10-21 Jongwon Choi , Hyung Jin Chang , Tobias Fischer , Sangdoo Yun , Kyuewang Lee , Jiyeoup Jeong , Yiannis Demiris , Jin Young Choi

Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors

Vision Transformer (ViT)-based sparse multi-view 3D object detectors have achieved remarkable accuracy but still suffer from high inference latency due to heavy token processing. To accelerate these models, token compression has been widely…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Mingqian Ji , Shanshan Zhang , Jian Yang