Related papers: ProAPO: Progressively Automatic Prompt Optimizatio…

IPO: Interpretable Prompt Optimization for Vision-Language Models

Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks. Nonetheless, their performance heavily depends on the specificity of the input text prompts, which requires skillful prompt template…

Machine Learning · Computer Science 2024-10-22 Yingjun Du , Wenfang Sun , Cees G. M. Snoek

Progressive Multi-modal Conditional Prompt Tuning

Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-04-25 Xiaoyu Qiu , Hao Feng , Yuechen Wang , Wengang Zhou , Houqiang Li

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation

Text-to-image models are powerful for producing high-quality images based on given text prompts, but crafting these prompts often requires specialized vocabulary. To address this, existing methods train rewriting models with supervision…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Hongji Yang , Yucheng Zhou , Wencheng Han , Jianbing Shen

Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models

We present a framework for optimizing prompts in vision-language models to elicit multimodal reasoning without model retraining. Using an evolutionary algorithm to guide prompt updates downstream of visual tasks, our approach improves upon…

Computation and Language · Computer Science 2025-04-01 Sid Bharthulwar , John Rho , Katrina Brown

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

Vision-language models (VLMs) offer a promising paradigm for image classification by comparing the similarity between images and class embeddings. A critical challenge lies in crafting precise textual representations for class names. While…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Songhao Han , Le Zhuo , Yue Liao , Si Liu

AutoV: Loss-Oriented Ranking for Visual Prompt Retrieval in LVLMs

Inspired by text prompts in large language models, visual prompts have been explored to enhance the perceptual capabilities of large vision-language models (LVLMs). However, performance tends to saturate under single visual prompt designs,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Yuan Zhang , Chun-Kai Fan , Sicheng Yu , Junwen Pan , Tao Huang , Ming Lu , Kuan Cheng , Qi She , Shanghang Zhang

ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models

The remarkable performance of Large Language Models (LLMs) highly relies on crafted prompts. However, manual prompt engineering is a laborious process, creating a core bottleneck for practical application of LLMs. This phenomenon has led to…

Computation and Language · Computer Science 2025-11-21 Qing Zhang , Bing Xu , Xudong Zhang , Yifan Shi , Yang Li , Chen Zhang , Yik Chung Wu , Ngai Wong , Yijie Chen , Hong Dai , Xiansen Chen , Mian Zhang

In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement

Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality. Most existing methods rely on human supervision or parameter retraining, both of which…

Computation and Language · Computer Science 2025-05-27 Zhen-Yu Zhang , Jiandong Zhang , Huaxiu Yao , Gang Niu , Masashi Sugiyama

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and…

Computation and Language · Computer Science 2023-10-20 Reid Pryzant , Dan Iter , Jerry Li , Yin Tat Lee , Chenguang Zhu , Michael Zeng

Bi-Level Prompt Optimization for Multimodal LLM-as-a-Judge

Large language models (LLMs) have become widely adopted as automated judges for evaluating AI-generated content. Despite their success, aligning LLM-based evaluations with human judgments remains challenging. While supervised fine-tuning on…

Artificial Intelligence · Computer Science 2026-02-13 Bo Pan , Xuan Kan , Kaitai Zhang , Yan Yan , Shunwen Tan , Zihao He , Zixin Ding , Junjie Wu , Liang Zhao

Progressive Visual Prompt Learning with Contrastive Feature Re-formation

Prompt learning has been designed as an alternative to fine-tuning for adapting Vision-language (V-L) models to the downstream tasks. Previous works mainly focus on text prompt while visual prompt works are limited for V-L models. The…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Chen Xu , Yuhan Zhu , Haocheng Shen , Boheng Chen , Yixuan Liao , Xiaoxin Chen , Limin Wang

Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?

Going beyond mere fine-tuning of vision-language models (VLMs), learnable prompt tuning has emerged as a promising, resource-efficient alternative. Despite their potential, effectively learning prompts faces the following challenges: (i)…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Hari Chandana Kuchibhotla , Sai Srinivas Kancheti , Abbavaram Gowtham Reddy , Vineeth N Balasubramanian

LPT: Less-overfitting Prompt Tuning for Vision-Language Model

Vision-language models (VLMs) have demonstrated exceptional generalization capabilities for downstream tasks. Due to its efficiency, prompt learning has gradually become a more effective and efficient method for transferring VLMs to…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Chenhao Ding , Xinyuan Gao , Songlin Dong , Jizhou Han , Qiang Wang , Zhengdong Zhou , Yuhang He , Yihong Gong

Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks

Vision generation remains a challenging frontier in artificial intelligence, requiring seamless integration of visual understanding and generative capabilities. In this paper, we propose a novel framework, Vision-Driven Prompt Optimization…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Leo Franklin , Apiradee Boonmee , Kritsada Wongsuwan

Modeling Variants of Prompts for Vision-Language Models

Large pre-trained vision-language models (VLMs) offer a promising approach to leveraging human language for enhancing downstream tasks. However, VLMs such as CLIP face significant limitation: its performance is highly sensitive to prompt…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Ao Li , Zongfang Liu , Xinhua Li , Jinghui Zhang , Pengwei Wang , Hu Wang

RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language Preferences

Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Haoxian Chen , Yucheng Guo , Genta Indra Winata , Tingting Ou , Ziyu Huang , David D. Yao , Wenpin Tang

Human-Free Automated Prompting for Vision-Language Anomaly Detection: Prompt Optimization with Meta-guiding Prompt Scheme

Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Pi-Wei Chen , Jerry Chun-Wei Lin , Jia Ji , Feng-Hao Yeh , Zih-Ching Chen , Chao-Chun Chen

Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation

Prompt engineering has proven to be a crucial step in leveraging pretrained large language models (LLMs) in solving various real-world tasks. Numerous solutions have been proposed that seek to automate prompt engineering by using the model…

Computation and Language · Computer Science 2025-07-15 Muzhaffar Hazman , Minh-Khoi Pham , Shweta Soundararajan , Goncalo Mordido , Leonardo Custode , David Lynch , Giorgio Cruciata , Yucheng Shi , Hongmeng Song , Wang Chao , Pan Yue , Aleksandar Milenovic , Alexandros Agapitos

Evolving Prompt Adaptation for Vision-Language Models

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Enming Zhang , Jiayang Li , Yanru Wu , Zhenyu Liu , Yang Li

DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, largely driven by well-designed prompts. However, crafting and selecting such prompts often requires considerable human effort, significantly limiting its…

Computation and Language · Computer Science 2025-03-20 Dengyun Peng , Yuhang Zhou , Qiguang Chen , Jinhao Liu , Jingjing Chen , Libo Qin