Related papers: SparseFusion: Fusing Multi-Modal Sparse Representa…

SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception

Multi-modal 3D object detection has exhibited significant progress in recent years. However, most existing methods can hardly scale to long-range scenarios due to their reliance on dense 3D features, which substantially escalate…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Yiheng Li , Hongyang Li , Zehao Huang , Hong Chang , Naiyan Wang

SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection

Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts.…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Hongcheng Zhang , Liu Liang , Pengxin Zeng , Xiao Song , Zhe Wang

Sparse Dense Fusion for 3D Object Detection

With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection. Although multiple fusion approaches have been proposed, they can be classified into either sparse-only or dense-only fashion based…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Yulu Gao , Chonghao Sima , Shaoshuai Shi , Shangzhe Di , Si Liu , Hongyang Li

Fully Sparse Fusion for 3D Object Detection

Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Yingyan Li , Lue Fan , Yang Liu , Zehao Huang , Yuntao Chen , Naiyan Wang , Zhaoxiang Zhang

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Yang Yang , Weijie Ma , Hao Chen , Linlin Ou , Xinyi Yu

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

The integration of data from diverse sensor modalities (e.g., camera and LiDAR) constitutes a prevalent methodology within the ambit of autonomous driving scenarios. Recent advancements in efficient point cloud transformers have underscored…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Yutao Zhu , Xiaosong Jia , Xinyu Yang , Junchi Yan

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Yang Jiao , Zequn Jie , Shaoxiang Chen , Jingjing Chen , Lin Ma , Yu-Gang Jiang

InsFusion: Rethink Instance-level LiDAR-Camera Fusion for 3D Object Detection

Three-dimensional Object Detection from multi-view cameras and LiDAR is a crucial component for autonomous driving and smart transportation. However, in the process of basic feature extraction, perspective transformation, and feature…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Zhongyu Xia , Hansong Yang , Yongtao Wang

Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion

Bird-eye-view (BEV) based methods have made great progress recently in multi-view 3D detection task. Comparing with BEV based methods, sparse based methods lag behind in performance, but still have lots of non-negligible merits. To push…

Computer Vision and Pattern Recognition · Computer Science 2023-02-13 Xuewu Lin , Tianwei Lin , Zixiang Pei , Lichao Huang , Zhizhong Su

LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation

3D object detection is fundamental for safe and robust intelligent transportation systems. Current multi-modal 3D object detectors often rely on complex architectures and training strategies to achieve higher detection accuracy. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-24 Xiangxuan Ren , Zhongdao Wang , Pin Tang , Guoqing Wang , Jilai Zheng , Chao Ma

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 Xuyang Bai , Zeyu Hu , Xinge Zhu , Qingqiu Huang , Yilun Chen , Hongbo Fu , Chiew-Lan Tai

DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily,…

Computer Vision and Pattern Recognition · Computer Science 2022-09-28 Florian Drews , Di Feng , Florian Faion , Lars Rosenbaum , Michael Ulrich , Claudius Gläser

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data…

Computer Vision and Pattern Recognition · Computer Science 2023-11-01 Yiran Qin , Chaoqun Wang , Zijian Kang , Ningning Ma , Zhen Li , Ruimao Zhang

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Zhizhuo Zhou , Shubham Tulsiani

SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection

Most previous 3D object detection methods that leverage the multi-modality of LiDAR and cameras utilize the Bird's Eye View (BEV) space for intermediate feature representation. However, this space uses a low x, y-resolution and sacrifices…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Hyeongseok Son , Jia He , Seung-In Park , Ying Min , Yunhao Zhang , ByungIn Yoo

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to…

Computer Vision and Pattern Recognition · Computer Science 2022-03-17 Yingwei Li , Adams Wei Yu , Tianjian Meng , Ben Caine , Jiquan Ngiam , Daiyi Peng , Junyang Shen , Bo Wu , Yifeng Lu , Denny Zhou , Quoc V. Le , Alan Yuille , Mingxing Tan

SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great attention over the past few years. Dense detectors typically follow a two-stage pipeline by first constructing a dense BEV feature and then performing object…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Haisong Liu , Yao Teng , Tao Lu , Haiguang Wang , Limin Wang

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving. Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds to enhance…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Shaoqing Xu , Fang Li , Ziying Song , Jin Fang , Sifen Wang , Zhi-Xin Yang

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Recent years have witnessed the remarkable progress of 3D multi-modality object detection methods based on the Bird's-Eye-View (BEV) perspective. However, most of them overlook the complementary interaction and guidance between LiDAR and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-04 Xiaotian Li , Baojie Fan , Jiandong Tian , Huijie Fan

Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimationand 3D Object Detection

The ability to accurately detect and localize objects is recognized as being the most important for the perception of self-driving cars. From 2D to 3D object detection, the most difficult is to determine the distance from the ego-vehicle to…

Computer Vision and Pattern Recognition · Computer Science 2021-05-31 Nguyen Anh Minh Mai , Pierre Duthon , Louahdi Khoudour , Alain Crouzil , Sergio A. Velastin