Related papers: Multi-Modality Driven LoRA for Adverse Condition D…

DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding

Fusing sensors with complementary modalities is crucial for maintaining a stable and comprehensive understanding of abnormal driving scenes. However, Multimodal Large Language Models (MLLMs) are underexplored for leveraging multi-sensor…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Mingzhe Tao , Ruiping Liu , Junwei Zheng , Yufan Chen , Kedi Ying , M. Saquib Sarfraz , Kailun Yang , Jiaming Zhang , Rainer Stiefelhagen

An Unsupervised Domain Adaptive Approach for Multimodal 2D Object Detection in Adverse Weather Conditions

Integrating different representations from complementary sensing modalities is crucial for robust scene interpretation in autonomous driving. While deep learning architectures that fuse vision and range data for 2D object detection have…

Computer Vision and Pattern Recognition · Computer Science 2022-03-08 George Eskandar , Robert A. Marsden , Pavithran Pandiyan , Mario Döbler , Karim Guirguis , Bin Yang

Enhancing CLIP Robustness via Cross-Modality Alignment

Vision-language models (VLMs) such as CLIP demonstrate strong generalization in zero-shot classification but remain highly vulnerable to adversarial perturbations. Existing methods primarily focus on adversarial fine-tuning or prompt…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Xingyu Zhu , Beier Zhu , Shuo Wang , Kesen Zhao , Hanwang Zhang

Multi-level Domain Adaptation for Lane Detection

We focus on bridging domain discrepancy in lane detection among different scenarios to greatly reduce extra annotation and re-training costs for autonomous driving. Critical factors hinder the performance improvement of cross-domain lane…

Computer Vision and Pattern Recognition · Computer Science 2022-11-10 Chenguang Li , Boheng Zhang , Jia Shi , Guangliang Cheng

Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather

Monocular depth estimation is critical for applications such as autonomous driving and scene reconstruction. While existing methods perform well under normal scenarios, their performance declines in adverse weather, due to challenging…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Kui Jiang , Jing Cao , Zhaocheng Yu , Junjun Jiang , Jingchun Zhou

Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors

Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Atif Belal , Akhil Meethal , Francisco Perdigon Romero , Marco Pedersoli , Eric Granger

UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions

Visual object tracking has gained promising progress in past decades. Most of the existing approaches focus on learning target representation in well-conditioned daytime data, while for the unconstrained real-world scenarios with adverse…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Siyuan Yao , Rui Zhu , Ziqi Wang , Wenqi Ren , Yanyang Yan , Xiaochun Cao

ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions

The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Shounak Sural , Nishad Sahu , Ragunathan Rajkumar

ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models

The success of CLIP-like vision-language models (VLMs) on natural images has inspired medical counterparts, yet existing approaches largely fall into two extremes: specialist models trained on single-domain data, which capture…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 M. Arda Aydın , Melih B. Yilmaz , Aykut Koç , Tolga Çukur

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

We focus on improving the visual understanding capability for boosting the vision-language models. We propose \textbf{Arcana}, a multiModal language model, which introduces two crucial techniques. First, we present Multimodal LoRA…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Yanpeng Sun , Huaxin Zhang , Qiang Chen , Xinyu Zhang , Nong Sang , Gang Zhang , Jingdong Wang , Zechao Li

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the CARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Alex Koran , Dimitrios Sinodinos , Hadi Hojjati , Takuya Nanri , Fangge Chen , Narges Armanfard

DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation

Vision-Language Models (VLMs) are foundational to critical applications like autonomous driving, medical diagnosis, and content moderation. While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA enable their efficient adaptation to…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Ved Umrajkar

Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation

All-weather image restoration (AWIR) is crucial for reliable autonomous navigation under adverse weather conditions. AWIR models are trained to address a specific set of weather conditions such as fog, rain, and snow. But this causes them…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Sudarshan Rajagopalan , Vishal M. Patel

Cross-Modal Attention Analysis and Optimization in Vision-Language Models: A Study on Visual Reliability

Vision-Language Models (VLMs) achieve strong cross-modal performance, yet recent evidence suggests they over-rely on textual descriptions while under-utilizing visual evidence -- a phenomenon termed ``text shortcut learning.'' We propose an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Lijie Zhou

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Conditional inference on joint textual and visual clues is a multi-modal reasoning task that textual clues provide prior permutation or external knowledge, which are complementary with visual content and pivotal to deducing the correct…

Computation and Language · Computer Science 2023-05-09 Yunxin Li , Baotian Hu , Xinyu Chen , Yuxin Ding , Lin Ma , Min Zhang

Multimodal End-to-End Learning for Autonomous Steering in Adverse Road and Weather Conditions

Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for…

Computer Vision and Pattern Recognition · Computer Science 2021-06-30 Jyri Maanpää , Josef Taher , Petri Manninen , Leo Pakola , Iaroslav Melekhov , Juha Hyyppä

Multi-Modal Guided Multi-Source Domain Adaptation for Object Detection

General object detection (OD) struggles to detect objects in the target domain that differ from the training distribution. To address this, recent studies demonstrate that training from multiple source domains and explicitly processing them…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Sangin Lee , Seokjun Kwon , Jeongmin Shin , Namil Kim , Yukyung Choi

Condition directed Multi-domain Adversarial Learning for Loop Closure Detection

Loop closure detection (LCD) is the key module in appearance based simultaneously localization and mapping (SLAM). However, in the real life, the appearance of visual inputs are usually affected by the illumination changes and texture…

Robotics · Computer Science 2017-11-22 Peng Yin , Yuqing He , Na Liu , Jianda Han

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings

Multimodal embedding models, built upon causal Vision Language Models (VLMs), have shown promise in various tasks. However, current approaches face three key limitations: the use of causal attention in VLM backbones is suboptimal for…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Haonan Chen , Hong Liu , Yuping Luo , Liang Wang , Nan Yang , Furu Wei , Zhicheng Dou

Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation

Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause…

Robotics · Computer Science 2023-03-09 Tao Li , Haozhe Lei , Quanyan Zhu