Related papers: EIMC: Efficient Instance-aware Multi-modal Collabo…

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.…

Computer Vision and Pattern Recognition · Computer Science 2024-02-26 Zhe Wang , Siqi Fan , Xiaoliang Huo , Tongda Xu , Yan Wang , Jingjing Liu , Yilun Chen , Ya-Qin Zhang

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Shaohong Wang , Lu Bin , Xinyu Xiao , Zhiyu Xiang , Hangguan Shan , Eryun Liu

UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving

Autonomous driving holds transformative potential but remains fundamentally constrained by the limited perception and isolated decision-making with standalone intelligence. While recent multi-agent approaches introduce cooperation, they…

Robotics · Computer Science 2025-11-13 Ziyi Song , Chen Xia , Chenbing Wang , Haibao Yu , Sheng Zhou , Zhisheng Niu

BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities

Collaborative perception enables agents to share complementary perceptual information with nearby agents. This would improve the perception performance and alleviate the issues of single-view perception, such as occlusion and sparsity. Most…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Binyu Zhao , Wei Zhang , Zhaonian Zou

End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration

Multi-view cooperative perception and multimodal fusion are essential for reliable 3D spatiotemporal understanding in autonomous driving, especially under occlusions, limited viewpoints, and communication delays in V2X scenarios. This paper…

Computer Vision and Pattern Recognition · Computer Science 2025-12-29 Zhenwei Yang , Yibo Ai , Weidong Zhang

UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks essential for autonomous driving. In real-world driving scenarios, it's commonplace to…

Computer Vision and Pattern Recognition · Computer Science 2024-08-26 Jian Zou , Tianyu Huang , Guanglei Yang , Zhenhua Guo , Tao Luo , Chun-Mei Feng , Wangmeng Zuo

GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection

Multimodal Emotion Recognition in Conversation (ERC) plays an influential role in the field of human-computer interaction and conversational robotics since it can motivate machines to provide empathetic services. Multimodal data modeling is…

Multimedia · Computer Science 2023-11-23 Jiang Li , Xiaoping Wang , Guoqing Lv , Zhigang Zeng

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is…

Robotics · Computer Science 2023-09-20 Minh-Quan Dao , Julie Stephany Berrio , Vincent Frémont , Mao Shan , Elwan Héry , Stewart Worrall

Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture

Multimodal sentiment analysis, a pivotal task in affective computing, seeks to understand human emotions by integrating cues from language, audio, and visual signals. While many recent approaches leverage complex attention mechanisms and…

Computation and Language · Computer Science 2025-05-09 Nischal Mandal , Yang Li

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge

This paper presents Edge-based Mixture of Experts (MoE) Collaborative Computing (EMC2), an optimal computing system designed for autonomous vehicles (AVs) that simultaneously achieves low-latency and high-accuracy 3D object detection.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Linshen Liu , Boyan Su , Junyue Jiang , Guanlin Wu , Cong Guo , Ceyu Xu , Hao Frank Yang

HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles

In cooperative perception studies, there is often a trade-off between communication bandwidth and perception performance. While current feature fusion solutions are known for their excellent object detection performance, transmitting the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-29 Deyuan Qu , Qi Chen , Yongqi Zhu , Yihao Zhu , Sergei S. Avedisov , Song Fu , Qing Yang

mmFUSION: Multimodal Fusion for 3D Objects Detection

Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems. Camera and LiDAR are the most commonly used sensors, and usually, their fusion happens at the early or late stages of 3D detectors with the help of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Javed Ahmad , Alessio Del Bue

EPIC: Efficient Prompt Interaction for Text-Image Classification

In recent years, large-scale pre-trained multimodal models (LMMs) generally emerge to integrate the vision and language modalities, achieving considerable success in multimodal tasks, such as text-image classification. The growing size of…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Xinyao Yu , Hao Sun , Zeyu Ling , Ziwei Niu , Zhenjia Bai , Rui Qin , Yen-Wei Chen , Lanfen Lin

Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors

3D object detection is a common function within the perception system of an autonomous vehicle and outputs a list of 3D bounding boxes around objects of interest. Various 3D object detection methods have relied on fusion of different sensor…

Computer Vision and Pattern Recognition · Computer Science 2020-11-02 Eduardo Arnold , Mehrdad Dianati , Robert de Temple , Saber Fallah

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Jifeng Shen , Yifei Chen , Yue Liu , Xin Zuo , Heng Fan , Wankou Yang

Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems

Integrated sensing and communication (ISAC) systems operating at terahertz (THz) bands are envisioned to enable both ultra-high data-rate communication and precise environmental awareness for next-generation wireless networks. However, the…

Signal Processing · Electrical Eng. & Systems 2025-05-06 Kai Zhang , Wentao Yu , Hengtao He , Shenghui Song , Jun Zhang , Khaled B. Letaief

EffiComm: Bandwidth Efficient Multi Agent Communication

Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Melih Yazgan , Allen Xavier Arasan , J. Marius Zöllner

E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications.…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Jiaqing Zhang , Mingxiang Cao , Weiying Xie , Jie Lei , Daixun Li , Wenbo Huang , Yunsong Li , Xue Yang

Attention Based Feature Fusion For Multi-Agent Collaborative Perception

In the domain of intelligent transportation systems (ITS), collaborative perception has emerged as a promising approach to overcome the limitations of individual perception by enabling multiple agents to exchange information, thus enhancing…

Multiagent Systems · Computer Science 2023-05-04 Ahmed N. Ahmed , Siegfried Mercelis , Ali Anwar

A Cooperative Perception System Robust to Localization Errors

Cooperative perception is challenging for safety-critical autonomous driving applications.The errors in the shared position and pose cause an inaccurate relative transform estimation and disrupt the robust mapping of the Ego vehicle. We…

Multiagent Systems · Computer Science 2023-04-27 Zhiying Song , Fuxi Wen , Hailiang Zhang , Jun Li