Related papers: Stylus: Automatic Adapter Selection for Diffusion …

Optimizing Prompts for Text-to-Image Generation

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation,…

Computation and Language · Computer Science 2024-01-01 Yaru Hao , Zewen Chi , Li Dong , Furu Wei

A Closer Look at Parameter-Efficient Tuning in Diffusion Models

Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Chendong Xiang , Fan Bao , Chongxuan Li , Hang Su , Jun Zhu

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive…

Artificial Intelligence · Computer Science 2024-04-09 Shachar Rosenman , Vasudev Lal , Phillip Howard

FontUse: A Data-Centric Approach to Style- and Use-Case-Conditioned In-Image Typography

Recent text-to-image models can generate high-quality images from natural-language prompts, yet controlling typography remains challenging: requested typographic appearance is often ignored or only weakly followed. We address this…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Xia Xin , Yuki Endo , Yoshihiro Kanamori

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense…

Computation and Language · Computer Science 2023-11-30 Shanshan Zhong , Zhongzhan Huang , Wushao Wen , Jinghui Qin , Liang Lin

Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as…

Computation and Language · Computer Science 2025-02-11 Maximilian Spliethöver , Tim Knebler , Fabian Fumagalli , Maximilian Muschalik , Barbara Hammer , Eyke Hüllermeier , Henning Wachsmuth

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Recent years have witnessed the strong power of large text-to-image diffusion models for the impressive generative capability to create high-fidelity images. However, it is very tricky to generate desired images using only text prompt as it…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Hu Ye , Jun Zhang , Sibo Liu , Xiao Han , Wei Yang

SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

Prompt-based models have demonstrated impressive prompt-following capability at image editing tasks. However, the models still struggle with following detailed editing prompts or performing local edits. Specifically, global image quality…

Graphics · Computer Science 2025-10-20 Kenan Tang , Yanhong Li , Yao Qin

Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models

Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. Recent research has extended these models to support text-guided image editing. While text guidance is an intuitive editing…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Jooyoung Choi , Yunjey Choi , Yunji Kim , Junho Kim , Sungroh Yoon

Prompt-aware classifier free guidance for diffusion models

Diffusion models have achieved remarkable progress in image and audio generation, largely due to Classifier-Free Guidance. However, the choice of guidance scale remains underexplored: a fixed scale often fails to generalize across prompts…

Sound · Computer Science 2025-10-07 Xuanhao Zhang , Chang Li

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Senmao Li , Joost van de Weijer , Taihang Hu , Fahad Shahbaz Khan , Qibin Hou , Yaxing Wang , Jian Yang , Ming-Ming Cheng

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Animesh Sinha , Bo Sun , Anmol Kalia , Arantxa Casanova , Elliot Blanchard , David Yan , Winnie Zhang , Tony Nelli , Jiahui Chen , Hardik Shah , Licheng Yu , Mitesh Kumar Singh , Ankit Ramchandani , Maziar Sanjabi , Sonal Gupta , Amy Bearman , Dhruv Mahajan

Prompt Diffusion Robustifies Any-Modality Prompt Learning

Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen…

Machine Learning · Computer Science 2024-10-29 Yingjun Du , Gaowen Liu , Yuzhang Shang , Yuguang Yao , Ramana Kompella , Cees G. M. Snoek

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Xiaoshi Wu , Keqiang Sun , Feng Zhu , Rui Zhao , Hongsheng Li

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Diffusion models continuously push the boundary of state-of-the-art image generation, but the process is hard to control with any nuance: practice proves that textual prompts are inadequate for accurately describing image style or fine…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Ciara Rowles , Shimon Vainer , Dante De Nigris , Slava Elizarov , Konstantin Kutsy , Simon Donné

BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning

This study aims to explore efficient tuning methods for the screenshot captioning task. Recently, image captioning has seen significant advancements, but research in captioning tasks for mobile screens remains relatively scarce. Current…

Machine Learning · Computer Science 2023-09-27 Ching-Yu Chiang , I-Hua Chang , Shih-Wei Liao

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning. Unlike traditional visual systems trained by a fixed set of discrete labels, a new paradigm was introduced in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Peng Gao , Shijie Geng , Renrui Zhang , Teli Ma , Rongyao Fang , Yongfeng Zhang , Hongsheng Li , Yu Qiao

PromptFix: You Prompt and We Fix the Photo

Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Yongsheng Yu , Ziyun Zeng , Hang Hua , Jianlong Fu , Jiebo Luo

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

Text-to-image generation models~(e.g., Stable Diffusion) have achieved significant advancements, enabling the creation of high-quality and realistic images based on textual descriptions. Prompt inversion, the task of identifying the textual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Mingzhe Li , Kejing Xia , Gehao Zhang , Zhenting Wang , Guanhong Tao , Siqi Pan , Juan Zhai , Shiqing Ma

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their…

Computation and Language · Computer Science 2024-06-11 MohammadAli SadraeiJavaeri , Ehsaneddin Asgari , Alice Carolyn McHardy , Hamid Reza Rabiee