English
Related papers

Related papers: ProAPO: Progressively Automatic Prompt Optimizatio…

200 papers

Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks. Nonetheless, their performance heavily depends on the specificity of the input text prompts, which requires skillful prompt template…

Machine Learning · Computer Science 2024-10-22 Yingjun Du , Wenfang Sun , Cees G. M. Snoek

Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-04-25 Xiaoyu Qiu , Hao Feng , Yuechen Wang , Wengang Zhou , Houqiang Li

Text-to-image models are powerful for producing high-quality images based on given text prompts, but crafting these prompts often requires specialized vocabulary. To address this, existing methods train rewriting models with supervision…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Hongji Yang , Yucheng Zhou , Wencheng Han , Jianbing Shen

We present a framework for optimizing prompts in vision-language models to elicit multimodal reasoning without model retraining. Using an evolutionary algorithm to guide prompt updates downstream of visual tasks, our approach improves upon…

Computation and Language · Computer Science 2025-04-01 Sid Bharthulwar , John Rho , Katrina Brown

Vision-language models (VLMs) offer a promising paradigm for image classification by comparing the similarity between images and class embeddings. A critical challenge lies in crafting precise textual representations for class names. While…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Songhao Han , Le Zhuo , Yue Liao , Si Liu

Inspired by text prompts in large language models, visual prompts have been explored to enhance the perceptual capabilities of large vision-language models (LVLMs). However, performance tends to saturate under single visual prompt designs,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Yuan Zhang , Chun-Kai Fan , Sicheng Yu , Junwen Pan , Tao Huang , Ming Lu , Kuan Cheng , Qi She , Shanghang Zhang

The remarkable performance of Large Language Models (LLMs) highly relies on crafted prompts. However, manual prompt engineering is a laborious process, creating a core bottleneck for practical application of LLMs. This phenomenon has led to…

Computation and Language · Computer Science 2025-11-21 Qing Zhang , Bing Xu , Xudong Zhang , Yifan Shi , Yang Li , Chen Zhang , Yik Chung Wu , Ngai Wong , Yijie Chen , Hong Dai , Xiansen Chen , Mian Zhang

Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality. Most existing methods rely on human supervision or parameter retraining, both of which…

Computation and Language · Computer Science 2025-05-27 Zhen-Yu Zhang , Jiandong Zhang , Huaxiu Yao , Gang Niu , Masashi Sugiyama

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and…

Computation and Language · Computer Science 2023-10-20 Reid Pryzant , Dan Iter , Jerry Li , Yin Tat Lee , Chenguang Zhu , Michael Zeng

Large language models (LLMs) have become widely adopted as automated judges for evaluating AI-generated content. Despite their success, aligning LLM-based evaluations with human judgments remains challenging. While supervised fine-tuning on…

Artificial Intelligence · Computer Science 2026-02-13 Bo Pan , Xuan Kan , Kaitai Zhang , Yan Yan , Shunwen Tan , Zihao He , Zixin Ding , Junjie Wu , Liang Zhao

Prompt learning has been designed as an alternative to fine-tuning for adapting Vision-language (V-L) models to the downstream tasks. Previous works mainly focus on text prompt while visual prompt works are limited for V-L models. The…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Chen Xu , Yuhan Zhu , Haocheng Shen , Boheng Chen , Yixuan Liao , Xiaoxin Chen , Limin Wang

Going beyond mere fine-tuning of vision-language models (VLMs), learnable prompt tuning has emerged as a promising, resource-efficient alternative. Despite their potential, effectively learning prompts faces the following challenges: (i)…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Hari Chandana Kuchibhotla , Sai Srinivas Kancheti , Abbavaram Gowtham Reddy , Vineeth N Balasubramanian

Vision-language models (VLMs) have demonstrated exceptional generalization capabilities for downstream tasks. Due to its efficiency, prompt learning has gradually become a more effective and efficient method for transferring VLMs to…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Chenhao Ding , Xinyuan Gao , Songlin Dong , Jizhou Han , Qiang Wang , Zhengdong Zhou , Yuhang He , Yihong Gong

Vision generation remains a challenging frontier in artificial intelligence, requiring seamless integration of visual understanding and generative capabilities. In this paper, we propose a novel framework, Vision-Driven Prompt Optimization…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Leo Franklin , Apiradee Boonmee , Kritsada Wongsuwan

Large pre-trained vision-language models (VLMs) offer a promising approach to leveraging human language for enhancing downstream tasks. However, VLMs such as CLIP face significant limitation: its performance is highly sensitive to prompt…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Ao Li , Zongfang Liu , Xinhua Li , Jinghui Zhang , Pengwei Wang , Hu Wang

Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Haoxian Chen , Yucheng Guo , Genta Indra Winata , Tingting Ou , Ziyu Huang , David D. Yao , Wenpin Tang

Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Pi-Wei Chen , Jerry Chun-Wei Lin , Jia Ji , Feng-Hao Yeh , Zih-Ching Chen , Chao-Chun Chen

Prompt engineering has proven to be a crucial step in leveraging pretrained large language models (LLMs) in solving various real-world tasks. Numerous solutions have been proposed that seek to automate prompt engineering by using the model…

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Enming Zhang , Jiayang Li , Yanru Wu , Zhenyu Liu , Yang Li

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, largely driven by well-designed prompts. However, crafting and selecting such prompts often requires considerable human effort, significantly limiting its…

Computation and Language · Computer Science 2025-03-20 Dengyun Peng , Yuhang Zhou , Qiguang Chen , Jinhao Liu , Jingjing Chen , Libo Qin
‹ Prev 1 2 3 10 Next ›