English

DeepInteraction: 3D Object Detection via Modality Interaction

Computer Vision and Pattern Recognition 2022-12-09 v4

Abstract

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.

Keywords

Cite

@article{arxiv.2208.11112,
  title  = {DeepInteraction: 3D Object Detection via Modality Interaction},
  author = {Zeyu Yang and Jiaqi Chen and Zhenwei Miao and Wei Li and Xiatian Zhu and Li Zhang},
  journal= {arXiv preprint arXiv:2208.11112},
  year   = {2022}
}

Comments

To appear at NeurIPS 2022. 16 pages, 7 figure

R2 v1 2026-06-25T01:54:41.845Z