English
Related papers

Related papers: Adaptive Multi-Modality Prompt Learning

200 papers

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Cairong Zhao , Yubin Wang , Xinyang Jiang , Yifei Shen , Kaitao Song , Dongsheng Li , Duoqian Miao

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks. Our method includes two key designs. First, rather than directly adding together the prompt and the image,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-30 Junyang Wu , Xianhang Li , Chen Wei , Huiyu Wang , Alan Yuille , Yuyin Zhou , Cihang Xie

Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Zhenxiang Xiao , Yuzhong Chen , Lu Zhang , Junjie Yao , Zihao Wu , Xiaowei Yu , Yi Pan , Lin Zhao , Chong Ma , Xinyu Liu , Wei Liu , Xiang Li , Yixuan Yuan , Dinggang Shen , Dajiang Zhu , Tianming Liu , Xi Jiang

Large-scale multimodal models have shown excellent performance over a series of tasks powered by the large corpus of paired multimodal training data. Generally, they are always assumed to receive modality-complete inputs. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Lianyu Hu , Tongkai Shi , Wei Feng , Fanhua Shang , Liang Wan

Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sifan Long , Zhen Zhao , Junkun Yuan , Zichang Tan , Jiangjiang Liu , Luping Zhou , Shengsheng Wang , Jingdong Wang

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Yi-Lun Lee , Yi-Hsuan Tsai , Wei-Chen Chiu , Chen-Yu Lee

Missing modality issues are common in real-world applications, arising from factors such as equipment failures and privacy concerns. When fine-tuning pre-trained models on downstream datasets with missing modalities, performance can degrade…

Machine Learning · Computer Science 2025-03-04 Zirun Guo , Shulei Wang , Wang Lin , Weicai Yan , Yangyang Wu , Tao Jin

Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed instruction datasets. However, novel tasks would be encountered sequentially in dynamic world, which urges for equipping LMMs with multimodal…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Fanhu Zeng , Fei Zhu , Haiyang Guo , Xu-Yao Zhang , Cheng-Lin Liu

Multi-modal visual understanding of images with prompts involves using various visual and textual cues to enhance the semantic understanding of images. This approach combines both vision and language processing to generate more accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Yuzhou Peng

Understanding human perceptions presents a formidable multimodal challenge for computers, encompassing aspects such as sentiment tendencies and sense of humor. While various methods have recently been introduced to extract…

Multimedia · Computer Science 2023-11-21 Hao Sun , Ziwei Niu , Xinyao Yu , Jiaqing Liu , Yen-Wei Chen , Lanfen Lin

Prompt learning is a new learning paradigm which reformulates downstream tasks as similar pretraining tasks on pretrained models by leveraging textual prompts. Recent works have demonstrated that prompt learning is particularly useful for…

Computation and Language · Computer Science 2022-10-21 Yue Zhang , Hongliang Fei , Dingcheng Li , Tan Yu , Ping Li

Pre-trained vision-language models are able to interpret visual concepts and language semantics. Prompt learning, a method of constructing prompts for text encoders or image encoders, elicits the potentials of pre-trained models and readily…

Computer Vision and Pattern Recognition · Computer Science 2025-02-21 Zhenhan Huang , Tejaswini Pedapati , Pin-Yu Chen , Jianxi Gao

The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the…

Computation and Language · Computer Science 2024-07-09 Zirun Guo , Tao Jin , Zhou Zhao

In recent years, soft prompt learning methods have been proposed to fine-tune large-scale vision-language pre-trained models for various downstream tasks. These methods typically combine learnable textual tokens with class tokens as input…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Yingjie Tian , Yiqi Wang , Xianda Guo , Zheng Zhu , Long Chen

As powerful pre-trained vision-language models (VLMs) like CLIP gain prominence, numerous studies have attempted to combine VLMs for downstream tasks. Among these, prompt learning has been validated as an effective method for adapting to…

Computer Vision and Pattern Recognition · Computer Science 2024-09-19 Yu Du , Tong Niu , Rong Zhao

Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Hyeongjun Kwon , Taeyong Song , Somi Jeong , Jin Kim , Jinhyun Jang , Kwanghoon Sohn

Soft prompt learning methods are effective for adapting vision-language models (VLMs) to downstream tasks. Nevertheless, empirical evidence reveals a tendency of existing methods that they overfit seen classes and exhibit degraded…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Yang Chen , Shuai Fu , Yu Zhang

Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen…

Machine Learning · Computer Science 2024-10-29 Yingjun Du , Gaowen Liu , Yuzhang Shang , Yuguang Yao , Ramana Kompella , Cees G. M. Snoek

Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with…

Artificial Intelligence · Computer Science 2024-09-10 Ruiting Dai , Yuqiao Tan , Lisi Mo , Tao He , Ke Qin , Shuang Liang

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing…

Machine Learning · Computer Science 2025-11-11 Evelyn Chee , Wynne Hsu , Mong Li Lee
‹ Prev 1 2 3 10 Next ›