Related papers: Adaptive Multi-Modality Prompt Learning

Learning Domain Invariant Prompt for Vision-Language Models

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Cairong Zhao , Yubin Wang , Xinyang Jiang , Yifei Shen , Kaitao Song , Dongsheng Li , Duoqian Miao

Unleashing the Power of Visual Prompting At the Pixel Level

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks. Our method includes two key designs. First, rather than directly adding together the prompt and the image,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-30 Junyang Wu , Xianhang Li , Chen Wei , Huiyu Wang , Alan Yuille , Yuyin Zhou , Cihang Xie

Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT

Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Zhenxiang Xiao , Yuzhong Chen , Lu Zhang , Junjie Yao , Zihao Wu , Xiaowei Yu , Yi Pan , Lin Zhao , Chong Ma , Xinyu Liu , Wei Liu , Xiang Li , Yixuan Yuan , Dinggang Shen , Dajiang Zhu , Tianming Liu , Xi Jiang

Large-scale multimodal models have shown excellent performance over a series of tasks powered by the large corpus of paired multimodal training data. Generally, they are always assumed to receive modality-complete inputs. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Lianyu Hu , Tongkai Shi , Wei Feng , Fanhua Shang , Liang Wan

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sifan Long , Zhen Zhao , Junkun Yuan , Zichang Tan , Jiangjiang Liu , Luping Zhou , Shengsheng Wang , Jingdong Wang

Multimodal Prompting with Missing Modalities for Visual Recognition

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Yi-Lun Lee , Yi-Hsuan Tsai , Wei-Chen Chiu , Chen-Yu Lee

Efficient Prompting for Continual Adaptation to Missing Modalities

Missing modality issues are common in real-world applications, arising from factors such as equipment failures and privacy concerns. When fine-tuning pre-trained models on downstream datasets with missing modalities, performance can degrade…

Machine Learning · Computer Science 2025-03-04 Zirun Guo , Shulei Wang , Wang Lin , Weicai Yan , Yangyang Wu , Tao Jin

ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt

Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed instruction datasets. However, novel tasks would be encountered sequentially in dynamic world, which urges for equipping LMMs with multimodal…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Fanhu Zeng , Fei Zhu , Haiyang Guo , Xu-Yao Zhang , Cheng-Lin Liu

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Multi-modal visual understanding of images with prompts involves using various visual and textual cues to enhance the semantic understanding of images. This approach combines both vision and language processing to generate more accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Yuzhou Peng

Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding

Understanding human perceptions presents a formidable multimodal challenge for computers, encompassing aspects such as sentiment tendencies and sense of humor. While various methods have recently been introduced to extract…

Multimedia · Computer Science 2023-11-21 Hao Sun , Ziwei Niu , Xinyao Yu , Jiaqing Liu , Yen-Wei Chen , Lanfen Lin

Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

Prompt learning is a new learning paradigm which reformulates downstream tasks as similar pretraining tasks on pretrained models by leveraging textual prompts. Recent works have demonstrated that prompt learning is particularly useful for…

Computation and Language · Computer Science 2022-10-21 Yue Zhang , Hongliang Fei , Dingcheng Li , Tan Yu , Ping Li

Modular Prompt Learning Improves Vision-Language Models

Pre-trained vision-language models are able to interpret visual concepts and language semantics. Prompt learning, a method of constructing prompts for text encoders or image encoders, elicits the potentials of pre-trained models and readily…

Computer Vision and Pattern Recognition · Computer Science 2025-02-21 Zhenhan Huang , Tejaswini Pedapati , Pin-Yu Chen , Jianxi Gao

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the…

Computation and Language · Computer Science 2024-07-09 Zirun Guo , Tao Jin , Zhou Zhao

Multi-Prompt with Depth Partitioned Cross-Modal Learning

In recent years, soft prompt learning methods have been proposed to fine-tune large-scale vision-language pre-trained models for various downstream tasks. These methods typically combine learnable textual tokens with class tokens as input…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Yingjie Tian , Yiqi Wang , Xianda Guo , Zheng Zhu , Long Chen

Mixture of Prompt Learning for Vision Language Models

As powerful pre-trained vision-language models (VLMs) like CLIP gain prominence, numerous studies have attempted to combine VLMs for downstream tasks. Among these, prompt learning has been validated as an effective method for adapting to…

Computer Vision and Pattern Recognition · Computer Science 2024-09-19 Yu Du , Tong Niu , Rong Zhao

Probabilistic Prompt Learning for Dense Prediction

Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Hyeongjun Kwon , Taeyong Song , Somi Jeong , Jin Kim , Jinhyun Jang , Kwanghoon Sohn

MoPD: Mixture-of-Prompts Distillation for Vision-Language Models

Soft prompt learning methods are effective for adapting vision-language models (VLMs) to downstream tasks. Nevertheless, empirical evidence reveals a tendency of existing methods that they overfit seen classes and exhibit degraded…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Yang Chen , Shuai Fu , Yu Zhang

Prompt Diffusion Robustifies Any-Modality Prompt Learning

Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen…

Machine Learning · Computer Science 2024-10-29 Yingjun Du , Gaowen Liu , Yuzhang Shang , Yuguang Yao , Ramana Kompella , Cees G. M. Snoek

MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with…

Artificial Intelligence · Computer Science 2024-09-10 Ruiting Dai , Yuqiao Tan , Lisi Mo , Tao He , Ke Qin , Shuang Liang

Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing…

Machine Learning · Computer Science 2025-11-11 Evelyn Chee , Wynne Hsu , Mong Li Lee