Related papers: Multi-Modality Task Cascade for 3D Object Detectio…

3D Cascade RCNN: High Quality Object Detection in Point Clouds

Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection. However, there has not been evidence in…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Qi Cai , Yingwei Pan , Ting Yao , Tao Mei

Cross-Modality 3D Object Detection

In this paper, we focus on exploring the fusion of images and point clouds for 3D object detection in view of the complementary nature of the two modalities, i.e., images possess more semantic information while point clouds specialize in…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Ming Zhu , Chao Ma , Pan Ji , Xiaokang Yang

FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection

3D object detection with multi-sensors is essential for an accurate and reliable perception system of autonomous driving and robotics. Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm which merely…

Computer Vision and Pattern Recognition · Computer Science 2022-09-23 Xinli Xu , Shaocong Dong , Lihe Ding , Jie Wang , Tingfa Xu , Jianan Li

Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images

The 3D scene understanding is mainly considered as a crucial requirement in computer vision and robotics applications. One of the high-level tasks in 3D scene understanding is semantic segmentation of RGB-Depth images. With the availability…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Fahimeh Fooladgar , Shohreh Kasaei

PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection

LiDAR-based 3D object detection is an important task for autonomous driving and current approaches suffer from sparse and partial point clouds of distant and occluded objects. In this paper, we propose a novel two-stage approach, namely…

Computer Vision and Pattern Recognition · Computer Science 2020-12-23 Yanan Zhang , Di Huang , Yunhong Wang

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection

Recently, fusing the LiDAR point cloud and camera image to improve the performance and robustness of 3D object detection has received more and more attention, as these two modalities naturally possess strong complementarity. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Zhe Liu , Tengteng Huang , Bingling Li , Xiwu Chen , Xi Wang , Xiang Bai

MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection

Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Xun Tan , Xingyu Chen , Guowei Zhang , Jishiyu Ding , Xuguang Lan

M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving

The perception system for autonomous driving generally requires to handle multiple diverse sub-tasks. However, current algorithms typically tackle individual sub-tasks separately, which leads to low efficiency when aiming at obtaining…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Xuesong Chen , Shaoshuai Shi , Tao Ma , Jingqiu Zhou , Simon See , Ka Chun Cheung , Hongsheng Li

Relation Graph Network for 3D Object Detection in Point Clouds

Convolutional Neural Networks (CNNs) have emerged as a powerful strategy for most object detection tasks on 2D images. However, their power has not been fully realised for detecting 3D objects in point clouds directly without converting…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Mingtao Feng , Syed Zulqarnain Gilani , Yaonan Wang , Liang Zhang , Ajmal Mian

3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds

Semantic parsing of large-scale 3D point clouds is an important research topic in computer vision and remote sensing fields. Most existing approaches utilize hand-crafted features for each modality independently and combine them in a…

Computer Vision and Pattern Recognition · Computer Science 2017-07-24 Fangyu Liu , Shuaipeng Li , Liqiang Zhang , Chenghu Zhou , Rongtian Ye , Yuebin Wang , Jiwen Lu

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Gesture recognition is getting more and more popular due to various application possibilities in human-machine interaction. Existing multi-modal gesture recognition systems take multi-modal data as input to improve accuracy, but such…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Dinghao Fan , Hengjie Lu , Shugong Xu , Shan Cao

Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection

Accurate and robust object detection is critical for autonomous driving. Image-based detectors face difficulties caused by low visibility in adverse weather conditions. Thus, radar-camera fusion is of particular interest but presents…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Huawei Sun , Hao Feng , Georg Stettinger , Lorenzo Servadei , Robert Wille

Multi-Modal 3D Object Detection by Box Matching

Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Zhe Liu , Xiaoqing Ye , Zhikang Zou , Xinwei He , Xiao Tan , Errui Ding , Jingdong Wang , Xiang Bai

2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic Segmentation on Point Cloud

Recently, multi-modality models have been introduced because of the complementary information from different sensors such as LiDAR and cameras. It requires paired data along with precise calibrations for all modalities, the complicated…

Computer Vision and Pattern Recognition · Computer Science 2023-09-22 Guan-Cheng Lee

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding.…

Computer Vision and Pattern Recognition · Computer Science 2022-01-12 Ali Caglayan , Nevrez Imamoglu , Ahmet Burak Can , Ryosuke Nakamura

xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDAR

Point cloud segmentation is a fundamental task in 3D scene understanding. Its progress is constrained by the high cost and time required for dense 3D annotations, making labeled samples difficult to obtain. Beyond annotation scarcity,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Thenukan Pathmanathan , Kanchan Keisham , Thangarajah Akilan

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Nazanin Mahjourian , Vinh Nguyen

Multi-modal Multi-task Pre-training for Improved Point Cloud Understanding

Recent advances in multi-modal pre-training methods have shown promising effectiveness in learning 3D representations by aligning multi-modal features between 3D shapes and their corresponding 2D counterparts. However, existing multi-modal…

Computer Vision and Pattern Recognition · Computer Science 2025-07-24 Liwen Liu , Weidong Yang , Lipeng Ma , Ben Fei

Trunk-branch Contrastive Network with Multi-view Deformable Aggregation for Multi-view Action Recognition

Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yingyuan Yang , Guoyuan Liang , Can Wang , Xiaojun Wu

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene…

Computer Vision and Pattern Recognition · Computer Science 2022-12-14 Chaolong Yang , Yuyao Yan , Weiguang Zhao , Jianan Ye , Xi Yang , Amir Hussain , Kaizhu Huang