Related papers: Multi-Granularity Language-Guided Training for Mul…
Vision-Language MOT is a crucial tracking problem and has drawn increasing attention recently. It aims to track objects based on human language commands, replacing the traditional use of templates or pre-set information from training sets…
Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain,…
While Multi-Object Tracking (MOT) has made substantial advancements, it is limited by heavy reliance on prior knowledge and limited to predefined categories. In contrast, Generic Multiple Object Tracking (GMOT), tracking multiple objects…
Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It…
Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from…
Multiple object tracking (MOT) is a crucial task in computer vision society. However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast-moving camera…
Multi-modal large language models have demonstrated impressive performance across various tasks in different modalities. However, existing multi-modal models primarily emphasize capturing global information within each modality while…
Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for…
Multi-Object Tracking (MOT) remains a vital component of intelligent video analysis, which aims to locate targets and maintain a consistent identity for each target throughout a video sequence. Existing works usually learn a discriminative…
Multi-Object Tracking (MOT) is evolving from geometric localization to Semantic MOT (SMOT) to answer complex relational queries, yet progress is hindered by semantic data scarcity and a structural disconnect between tracking architectures…
Current multi-object tracking (MOT) algorithms typically overlook issues inherent in low-quality videos, leading to significant degradation in tracking performance when confronted with real-world image deterioration. Therefore, advancing…
The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance…
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to localize an arbitrary number of targets based on a language expression and continuously track them in a video. This intricate task involves reasoning on…
Despite recent progress, Multi-Object Tracking (MOT) continues to face significant challenges, particularly its dependence on prior knowledge and predefined categories, complicating the tracking of unfamiliar objects. Generic Multiple…
Motion-based association for Multi-Object Tracking (MOT) has recently re-achieved prominence with the rise of powerful object detectors. Despite this, little work has been done to incorporate appearance cues beyond simple heuristic models…
3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects, assisting robots or vehicles in smarter path planning and obstacle avoidance. Existing 3D MOT methods based on the Tracking-by-Detection framework typically…
The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few…
Referring Multi-Object Tracking (RMOT) extends conventional multi-object tracking (MOT) by introducing natural language references for multi-modal fusion tracking. RMOT benchmarks only describe the object's appearance, relative positions,…
Object detection has long been a topic of high interest in computer vision literature. Motivated by the fact that annotating data for the multi-object tracking (MOT) problem is immensely expensive, recent studies have turned their attention…
Multi-Object Tracking (MOT) is a fundamental task in computer vision, aiming to track targets across video frames. Existing MOT methods perform well in general visual scenes, but face significant challenges and limitations when extended to…