Related papers: DeepInteraction: 3D Object Detection via Modality …

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-24 Zeyu Yang , Nan Song , Wei Li , Xiatian Zhu , Li Zhang , Philip H. S. Torr

InterTrack: Interaction Transformer for 3D Multi-Object Tracking

3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 John Willes , Cody Reading , Steven L. Waslander

MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection

The rise of autonomous vehicles has significantly increased the demand for robust 3D object detection systems. While cameras and LiDAR sensors each offer unique advantages--cameras provide rich texture information and LiDAR offers precise…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Zitian Wang , Zehao Huang , Yulu Gao , Naiyan Wang , Si Liu

Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Juhan Cha , Minseok Joo , Jihwan Park , Sanghyeok Lee , Injae Kim , Hyunwoo J. Kim

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Yang Jiao , Zequn Jie , Shaoxiang Chen , Jingjing Chen , Lin Ma , Yu-Gang Jiang

Learning Unseen Modality Interaction

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Yunhua Zhang , Hazel Doughty , Cees G. M. Snoek

Research on Image Recognition Technology Based on Multimodal Deep Learning

This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Jinyin Wang , Xingchen Li , Yixuan Jin , Yihao Zhong , Keke Zhang , Chang Zhou

Unifying Voxel-based Representation with Transformer for 3D Object Detection

In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Yanwei Li , Yilun Chen , Xiaojuan Qi , Zeming Li , Jian Sun , Jiaya Jia

DGFusion: Dual-guided Fusion for Robust Multi-Modal 3D Object Detection

As a critical task in autonomous driving perception systems, 3D object detection is used to identify and track key objects, such as vehicles and pedestrians. However, detecting distant, small, or occluded objects (hard instances) remains a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Feiyang Jia , Caiyan Jia , Ailin Liu , Shaoqing Xu , Qiming Xia , Lin Liu , Lei Yang , Yan Gong , Ziying Song

DeepFusionMOT: A 3D Multi-Object Tracking Framework Based on Camera-LiDAR Fusion with Deep Association

In the recent literature, on the one hand, many 3D multi-object tracking (MOT) works have focused on tracking accuracy and neglected computation speed, commonly by designing rather complex cost functions and feature extractors. On the other…

Computer Vision and Pattern Recognition · Computer Science 2022-08-29 Xiyang Wang , Chunyun Fu , Zhankun Li , Ying Lai , Jiawei He

Progressive Multi-Modal Fusion for Robust 3D Object Detection

Multi-sensor fusion is crucial for accurate 3D object detection in autonomous driving, with cameras and LiDAR being the most commonly used sensors. However, existing methods perform sensor fusion in a single view by projecting features from…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Rohit Mohan , Daniele Cattaneo , Florian Drews , Abhinav Valada

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques

Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Weide Liu , Wei Zhou , Jun Liu , Ping Hu , Jun Cheng , Jungong Han , Weisi Lin

mmFUSION: Multimodal Fusion for 3D Objects Detection

Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems. Camera and LiDAR are the most commonly used sensors, and usually, their fusion happens at the early or late stages of 3D detectors with the help of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Javed Ahmad , Alessio Del Bue

FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Object Detection

Multi-sensor modal fusion has demonstrated strong advantages in 3D object detection tasks. However, existing methods that fuse multi-modal features require transforming features into the bird's eye view space and may lose certain…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Chunyong Hu , Hang Zheng , Kun Li , Jianyun Xu , Weibo Mao , Maochun Luo , Lingxuan Wang , Mingxia Chen , Qihao Peng , Kaixuan Liu , Yiru Zhao , Peihan Hao , Minzhe Liu , Kaicheng Yu

DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily,…

Computer Vision and Pattern Recognition · Computer Science 2022-09-28 Florian Drews , Di Feng , Florian Faion , Lars Rosenbaum , Michael Ulrich , Claudius Gläser

MMDR: A Result Feature Fusion Object Detection Approach for Autonomous System

Object detection has been extensively utilized in autonomous systems in recent years, encompassing both 2D and 3D object detection. Recent research in this field has primarily centered around multimodal approaches for addressing this…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Wendong Zhang

CrossOver: 3D Scene Cross-Modal Alignment

Multi-modal 3D object understanding has gained significant attention, yet current approaches often assume complete data availability and rigid alignment across all modalities. We present CrossOver, a novel framework for cross-modal 3D scene…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Sayan Deb Sarkar , Ondrej Miksik , Marc Pollefeys , Daniel Barath , Iro Armeni

DEFT: Detection Embeddings for Tracking

Most modern multiple object tracking (MOT) systems follow the tracking-by-detection paradigm, consisting of a detector followed by a method for associating detections into tracks. There is a long history in tracking of combining motion and…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Mohamed Chaabane , Peter Zhang , J. Ross Beveridge , Stephen O'Hara

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation…

Computer Vision and Pattern Recognition · Computer Science 2018-03-22 Chih-Yao Ma , Asim Kadav , Iain Melvin , Zsolt Kira , Ghassan AlRegib , Hans Peter Graf

Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion

An important paradigm in 3D object detection is the use of multiple modalities to enhance accuracy in both normal and challenging conditions, particularly for long-tail scenarios. To address this, recent studies have explored two directions…

Computer Vision and Pattern Recognition · Computer Science 2024-10-17 Minkyoung Cho , Yulong Cao , Jiachen Sun , Qingzhao Zhang , Marco Pavone , Jeong Joon Park , Heng Yang , Z. Morley Mao