Related papers: MAPLE: Modality-Aware Post-training and Learning E…

MaPLe: Multi-modal Prompt Learning

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Muhammad Uzair Khattak , Hanoona Rasheed , Muhammad Maaz , Salman Khan , Fahad Shahbaz Khan

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Audio and omni-modal large language models exhibit impressive cross-modal reasoning capabilities. However, applying standard reinforcement learning post-training algorithms to these models exposes a critical structural vulnerability:…

Computation and Language · Computer Science 2026-05-28 Cihan Xiao , Yiwen Shao , Chenxing Li , Xiang He , Zhenwen Liang , Steve Yves , Sanjeev Khudanpur , Liefeng Bo

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Oscar Mañas , Pau Rodriguez , Saba Ahmadi , Aida Nematzadeh , Yash Goyal , Aishwarya Agrawal

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Despite Contrastive Language-Image Pretraining (CLIP)'s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, we discover that off-the-shelf MLLMs (Multimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Pengfei Zhao , Rongbo Luan , Wei Zhang , Peng Wu , Sifeng He

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

Vision-language-action (VLA) models are effective as end-to-end motion planners, but can be brittle when evaluated in closed-loop settings due to being trained under traditional imitation learning framework. Existing closed-loop supervision…

Robotics · Computer Science 2026-05-21 Rajeev Yasarla , Deepti Hegde , Hsin-Pai Cheng , Shizhong Han , Yunxiao Shi , Meysam Sadeghigooghari , Hanno Ackermann , Litian Liu , Pranav Desai , Fatih Porikli , Mohammad Ghavamzadeh , Hong Cai

MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing…

Machine Learning · Computer Science 2024-10-10 Niki Nezakati , Md Kaykobad Reza , Ameya Patil , Mashhour Solh , M. Salman Asif

Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation

Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Md Kaykobad Reza , Ashley Prater-Bennette , M. Salman Asif

MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with…

Artificial Intelligence · Computer Science 2024-09-10 Ruiting Dai , Yuqiao Tan , Lisi Mo , Tao He , Ke Qin , Shuang Liang

MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning

In-Context Learning (ICL) empowers Large Language Models (LLMs) to tackle diverse tasks by incorporating multiple input-output examples, known as demonstrations, into the input of LLMs. More recently, advancements in the expanded context…

Artificial Intelligence · Computer Science 2025-05-27 Zihan Chen , Song Wang , Zhen Tan , Jundong Li , Cong Shen

MAPLE: A Framework for Active Preference Learning Guided by Large Language Models

The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of…

Machine Learning · Computer Science 2024-12-23 Saaduddin Mahmud , Mason Nakamura , Shlomo Zilberstein

Label What Matters: Modality-Balanced and Difficulty-Aware Multimodal Active Learning

Multimodal learning integrates complementary information from different modalities such as image, text, and audio to improve model performance, but its success relies on large-scale labeled data, which is costly to obtain. Active learning…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Yuqiao Zeng , Xu Wang , Tengfei Liang , Yiqing Hao , Yi Jin , Hui Yu

MAPLE: Self-Supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis

We present a new nonlinear dimensionality reduction method, MAPLE, that enhances UMAP by improving manifold modeling. MAPLE employs a self-supervised learning approach to more efficiently encode low-dimensional manifold geometry. Central to…

Machine Learning · Computer Science 2026-05-15 Zeyang Huang , Takanori Fujiwara , Angelos Chatzimparmpas , Wandrille Duchemin , Andreas Kerren

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences, demonstrating exceptional and measurable efficacy in instruction following tasks; however, it exhibits insufficient compliance…

Artificial Intelligence · Computer Science 2025-05-20 Ruopei Sun , Jianfeng Cai , Jinhua Zhu , Kangwen Zhao , Dongyun Xue , Wengang Zhou , Li Li , Houqiang Li

MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series

From clinical healthcare to daily living, continuous sensor monitoring across multiple modalities has shown great promise for real-world intelligent decision-making but also faces various challenges. In this work, we introduce MAESTRO, a…

Machine Learning · Computer Science 2025-10-01 Payal Mohapatra , Yueyuan Sui , Akash Pandey , Stephen Xia , Qi Zhu

APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning

Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Guiming Cao , Kaize Shi , Hong Fu , Huaiwen Zhang , Guandong Xu

Better Reasoning with Less Data: Enhancing VLMs Through Unified Modality Scoring

The application of visual instruction tuning and other post-training techniques has significantly enhanced the capabilities of Large Language Models (LLMs) in visual understanding, enriching Vision-Language Models (VLMs) with more…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Mingjie Xu , Andrew Estornell , Hongzheng Yang , Yuzhi Zhao , Zhaowei Zhu , Qi Xuan , Jiaheng Wei

Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come…

Sound · Computer Science 2025-01-31 Joanna Hong , Sanjeel Parekh , Honglie Chen , Jacob Donley , Ke Tan , Buye Xu , Anurag Kumar

MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning

Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Yi Xin , Junlong Du , Qiang Wang , Ke Yan , Shouhong Ding

Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models

Multimodal foundation models have achieved impressive progress across a wide range of vision-language tasks. However, existing approaches often adopt fixed or task-specific fusion strategies, neglecting the intrinsic variability of modality…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Liam Bennett , Mason Clark , Lucas Anderson , Hana Satou , Olivia Martinez

Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we…

Machine Learning · Computer Science 2026-03-20 Xiang Shi , Rui Zhang , Jiawei Liu , Yinpeng Liu , Qikai Cheng , Wei Lu