Related papers: Multiple Thinking Achieving Meta-Ability Decouplin…

ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Shuxiao Ding , Lukas Schneider , Marius Cordts , Juergen Gall

Multiple Object Tracking as ID Prediction

Multi-Object Tracking (MOT) has been a long-standing challenge in video understanding. A natural and intuitive approach is to split this task into two parts: object detection and association. Most mainstream methods employ meticulously…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Ruopeng Gao , Ji Qi , Limin Wang

MAT: Motion-Aware Multi-Object Tracking

Modern multi-object tracking (MOT) systems usually model the trajectories by associating per-frame detections. However, when camera motion, fast motion, and occlusion challenges occur, it is difficult to ensure long-range tracking or even…

Computer Vision and Pattern Recognition · Computer Science 2020-09-21 Shoudong Han , Piao Huang , Hongwei Wang , En Yu , Donghaisheng Liu , Xiaofeng Pan , Jun Zhao

Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Moving objects have special importance for Autonomous Driving tasks. Detecting moving objects can be posed as Moving Object Segmentation, by segmenting the object pixels, or Moving Object Detection, by generating a bounding box for the…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Eslam Mohamed , Ahmed El-Sallab

A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles

Connected autonomous vehicles (CAVs) must simultaneously perform multiple tasks, such as object detection, semantic segmentation, depth estimation, trajectory prediction, motion prediction, and behaviour prediction, to ensure safe and…

Robotics · Computer Science 2025-08-07 Jiayuan Wang , Farhad Pourpanah , Q. M. Jonathan Wu , Ning Zhang

Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking

Most existing multimodal trackers adopt uniform fusion strategies, overlooking the inherent differences between modalities. Moreover, they propagate temporal information through mixed tokens, leading to entangled and less discriminative…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Shilei Wang , Pujian Lai , Dong Gao , Jifeng Ning , Gong Cheng

RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation

Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID). In order to enhance the inference speed and reduce the complexity, current methods commonly integrate these…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 En Yu , Zhuoling Li , Shoudong Han , Hongwei Wang

Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts

This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) concepts. Four core agents, including image, text,…

Multiagent Systems · Computer Science 2026-04-08 Philip Xu

End-to-end Tracking with a Multi-query Transformer

Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection…

Computer Vision and Pattern Recognition · Computer Science 2022-10-27 Bruno Korbar , Andrew Zisserman

Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking

Traditional multiple object tracking methods divide the task into two parts: affinity learning and data association. The separation of the task requires to define a hand-crafted training goal in affinity learning stage and a hand-crafted…

Computer Vision and Pattern Recognition · Computer Science 2018-08-07 Han Shen , Lichao Huang , Chang Huang , Wei Xu

Online Multiple Object Tracking with Cross-Task Synergy

Modern online multiple object tracking (MOT) methods usually focus on two directions to improve tracking performance. One is to predict new positions in an incoming frame based on tracking information from previous frames, and the other is…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Song Guo , Jingya Wang , Xinchao Wang , Dacheng Tao

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more…

Machine Learning · Computer Science 2025-09-01 Abdulaziz Almuzairee , Rohan Patil , Dwait Bhatt , Henrik I. Christensen

SoDA: Multi-Object Tracking with Soft Data Association

Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars. Tracking objects, however, remains a highly challenging problem, especially in cluttered autonomous driving scenes in which objects tend to…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Wei-Chih Hung , Henrik Kretzschmar , Tsung-Yi Lin , Yuning Chai , Ruichi Yu , Ming-Hsuan Yang , Dragomir Anguelov

No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & LiDARs

Online multi-object tracking (MOT) is extremely important for high-level spatial reasoning and path planning for autonomous and highly-automated vehicles. In this paper, we present a modular framework for tracking multiple objects…

Computer Vision and Pattern Recognition · Computer Science 2019-02-20 Akshay Rangesh , Mohan M. Trivedi

Embodied Multimodal Multitask Learning

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question…

Machine Learning · Computer Science 2019-02-05 Devendra Singh Chaplot , Lisa Lee , Ruslan Salakhutdinov , Devi Parikh , Dhruv Batra

Transformer Network for Multi-Person Tracking and Re-Identification in Unconstrained Environment

Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics. Despite considerable advancements, existing MOT methodologies tend to falter…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Hamza Mukhtar , Muhammad Usman Ghani Khan

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex spatial reasoning tasks. Nonetheless,…

Computation and Language · Computer Science 2025-01-14 Chengzu Li , Wenshan Wu , Huanyu Zhang , Yan Xia , Shaoguang Mao , Li Dong , Ivan Vulić , Furu Wei

Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning

We present an efficient task and motion replanning approach for sequential multi-object manipulation in dynamic environments. Conventional Task And Motion Planning (TAMP) solvers experience an exponential increase in planning time as the…

Robotics · Computer Science 2026-05-20 Yan Zhang , Teng Xue , Amirreza Razmjoo , Sylvain Calinon

D-CAT: Decoupled Cross-Attention Transfer between Sensor Modalities for Unimodal Inference

Cross-modal transfer learning is used to improve multi-modal classification models (e.g., for human activity recognition in human-robot collaboration). However, existing methods require paired sensor data at both training and inference,…

Machine Learning · Computer Science 2025-09-15 Leen Daher , Zhaobo Wang , Malcolm Mielle

Deep Multi-Modal Sets

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-03-04 Austin Reiter , Menglin Jia , Pu Yang , Ser-Nam Lim