Related papers: DeepInteraction++: Multi-Modality Interaction for …

DeepInteraction: 3D Object Detection via Modality Interaction

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Zeyu Yang , Jiaqi Chen , Zhenwei Miao , Wei Li , Xiatian Zhu , Li Zhang

MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition

Driver action recognition, aiming to accurately identify drivers' behaviours, is crucial for enhancing driver-vehicle interactions and ensuring driving safety. Unlike general action recognition, drivers' environments are often challenging,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Ruoyu Wang , Wenqian Wang , Jianjun Gao , Dan Lin , Kim-Hui Yap , Bingbing Li

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by…

Computer Vision and Pattern Recognition · Computer Science 2021-11-05 Abhinav Valada , Rohit Mohan , Wolfram Burgard

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Balakrishnan Varadarajan , Ahmed Hefny , Avikalp Srivastava , Khaled S. Refaat , Nigamaa Nayakanti , Andre Cornman , Kan Chen , Bertrand Douillard , Chi Pang Lam , Dragomir Anguelov , Benjamin Sapp

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey

Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Keli Huang , Botian Shi , Xiang Li , Xin Li , Siyuan Huang , Yikang Li

Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving

Predicting the behaviors of other agents on the road is critical for autonomous driving to ensure safety and efficiency. However, the challenging part is how to represent the social interactions between agents and output different possible…

Robotics · Computer Science 2021-09-15 Zhiyu Huang , Xiaoyu Mo , Chen Lv

Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical…

Machine Learning · Computer Science 2025-07-29 Ziyi Liang , Annie Qu , Babak Shahbaba

Deep Equilibrium Multimodal Fusion

Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. Most existing fusion approaches either learn a fixed fusion strategy during training and inference, or are…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Jinhong Ni , Yalong Bai , Wei Zhang , Ting Yao , Tao Mei

A Unified Multi-scale and Multi-task Learning Framework for Driver Behaviors Reasoning

Mutual understanding between driver and vehicle is critically important to the design of intelligent vehicles and customized interaction interface. In this study, a unified driver behavior reasoning system toward multi-scale and multi-tasks…

Systems and Control · Electrical Eng. & Systems 2020-03-23 Yang Xing , Chen Lv , Dongpu Cao , Efstathios Velenis

Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs,…

Robotics · Computer Science 2020-07-20 Di Feng , Christian Haase-Schütz , Lars Rosenbaum , Heinz Hertlein , Claudius Glaeser , Fabian Timm , Werner Wiesbeck , Klaus Dietmayer

Learning Topological Interactions for Multi-Class Medical Image Segmentation

Deep learning methods have achieved impressive performance for multi-class medical image segmentation. However, they are limited in their ability to encode topological interactions among different classes (e.g., containment and exclusion).…

Computer Vision and Pattern Recognition · Computer Science 2022-07-21 Saumya Gupta , Xiaoling Hu , James Kaan , Michael Jin , Mutshipay Mpoy , Katherine Chung , Gagandeep Singh , Mary Saltz , Tahsin Kurc , Joel Saltz , Apostolos Tassiopoulos , Prateek Prasanna , Chao Chen

Multi-modal Integrated Prediction and Decision-making with Adaptive Interaction Modality Explorations

Navigating dense and dynamic environments poses a significant challenge for autonomous driving systems, owing to the intricate nature of multimodal interaction, wherein the actions of various traffic participants and the autonomous vehicle…

Robotics · Computer Science 2024-08-29 Tong Li , Lu Zhang , Sikang Liu , Shaojie Shen

Multi-modal Sensor Fusion-Based Deep Neural Network for End-to-end Autonomous Driving with Scene Understanding

This study aims to improve the performance and generalization capability of end-to-end autonomous driving with scene understanding leveraging deep learning and multimodal sensor fusion techniques. The designed end-to-end deep neural network…

Robotics · Computer Science 2020-08-04 Zhiyu Huang , Chen Lv , Yang Xing , Jingda Wu

Meta-Transformer: A Unified Framework for Multimodal Learning

Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Yiyuan Zhang , Kaixiong Gong , Kaipeng Zhang , Hongsheng Li , Yu Qiao , Wanli Ouyang , Xiangyu Yue

Learning Unseen Modality Interaction

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Yunhua Zhang , Hazel Doughty , Cees G. M. Snoek

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

Predicting future trajectories of traffic agents in highly interactive environments is an essential and challenging problem for the safe operation of autonomous driving systems. On the basis of the fact that self-driving vehicles are…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Chiho Choi , Joon Hee Choi , Jiachen Li , Srikanth Malla

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

Predicting future trajectories of traffic agents in highly interactive environments is an essential and challenging problem for the safe operation of autonomous driving systems. On the basis of the fact that self-driving vehicles are…

Computer Vision and Pattern Recognition · Computer Science 2021-06-15 Chiho Choi , Joon Hee Choi , Srikanth Malla , Jiachen Li

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex…

Computer Vision and Pattern Recognition · Computer Science 2022-12-08 Hao Shao , Letian Wang , RuoBing Chen , Hongsheng Li , Yu Liu

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Multimodal human action understanding is a significant problem in computer vision, with the central challenge being the effective utilization of the complementarity among diverse modalities while maintaining model efficiency. However, most…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Hongsong Wang , Heng Fei , Bingxuan Dai , Jie Gui

M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Dongyang Xu , Haokun Li , Qingfan Wang , Ziying Song , Lei Chen , Hanming Deng