English
Related papers

Related papers: Stylus: Automatic Adapter Selection for Diffusion …

200 papers

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation,…

Computation and Language · Computer Science 2024-01-01 Yaru Hao , Zewen Chi , Li Dong , Furu Wei

Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Chendong Xiang , Fan Bao , Chongxuan Li , Hang Su , Jun Zhu

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive…

Artificial Intelligence · Computer Science 2024-04-09 Shachar Rosenman , Vasudev Lal , Phillip Howard

Recent text-to-image models can generate high-quality images from natural-language prompts, yet controlling typography remains challenging: requested typographic appearance is often ignored or only weakly followed. We address this…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Xia Xin , Yuki Endo , Yoshihiro Kanamori

Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense…

Computation and Language · Computer Science 2023-11-30 Shanshan Zhong , Zhongzhan Huang , Wushao Wen , Jinghui Qin , Liang Lin

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as…

Recent years have witnessed the strong power of large text-to-image diffusion models for the impressive generative capability to create high-fidelity images. However, it is very tricky to generate desired images using only text prompt as it…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Hu Ye , Jun Zhang , Sibo Liu , Xiao Han , Wei Yang

Prompt-based models have demonstrated impressive prompt-following capability at image editing tasks. However, the models still struggle with following detailed editing prompts or performing local edits. Specifically, global image quality…

Graphics · Computer Science 2025-10-20 Kenan Tang , Yanhong Li , Yao Qin

Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. Recent research has extended these models to support text-guided image editing. While text guidance is an intuitive editing…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Jooyoung Choi , Yunjey Choi , Yunji Kim , Junho Kim , Sungroh Yoon

Diffusion models have achieved remarkable progress in image and audio generation, largely due to Classifier-Free Guidance. However, the choice of guidance scale remains underexplored: a fixed scale often fails to generalize across prompts…

Sound · Computer Science 2025-10-07 Xuanhao Zhang , Chang Li

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Senmao Li , Joost van de Weijer , Taihang Hu , Fahad Shahbaz Khan , Qibin Hou , Yaxing Wang , Jian Yang , Ming-Ming Cheng

We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images…

Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen…

Machine Learning · Computer Science 2024-10-29 Yingjun Du , Gaowen Liu , Yuzhang Shang , Yuguang Yao , Ramana Kompella , Cees G. M. Snoek

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Xiaoshi Wu , Keqiang Sun , Feng Zhu , Rui Zhao , Hongsheng Li

Diffusion models continuously push the boundary of state-of-the-art image generation, but the process is hard to control with any nuance: practice proves that textual prompts are inadequate for accurately describing image style or fine…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Ciara Rowles , Shimon Vainer , Dante De Nigris , Slava Elizarov , Konstantin Kutsy , Simon Donné

This study aims to explore efficient tuning methods for the screenshot captioning task. Recently, image captioning has seen significant advancements, but research in captioning tasks for mobile screens remains relatively scarce. Current…

Machine Learning · Computer Science 2023-09-27 Ching-Yu Chiang , I-Hua Chang , Shih-Wei Liao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning. Unlike traditional visual systems trained by a fixed set of discrete labels, a new paradigm was introduced in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Peng Gao , Shijie Geng , Renrui Zhang , Teli Ma , Rongyao Fang , Yongfeng Zhang , Hongsheng Li , Yu Qiao

Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Yongsheng Yu , Ziyun Zeng , Hang Hua , Jianlong Fu , Jiebo Luo

Text-to-image generation models~(e.g., Stable Diffusion) have achieved significant advancements, enabling the creation of high-quality and realistic images based on textual descriptions. Prompt inversion, the task of identifying the textual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Mingzhe Li , Kejing Xia , Gehao Zhang , Zhenting Wang , Guanhong Tao , Siqi Pan , Juan Zhai , Shiqing Ma

Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their…

Computation and Language · Computer Science 2024-06-11 MohammadAli SadraeiJavaeri , Ehsaneddin Asgari , Alice Carolyn McHardy , Hamid Reza Rabiee
‹ Prev 1 2 3 10 Next ›