English
Related papers

Related papers: Enhancing Diffusion Models with Text-Encoder Reinf…

200 papers

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function.…

Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Yanyu Li , Xian Liu , Anil Kag , Ju Hu , Yerlan Idelbayev , Dhritiman Sagar , Yanzhi Wang , Sergey Tulyakov , Jian Ren

Successful Artificial Intelligence systems often require numerous labeled data to extract information from document images. In this paper, we investigate the problem of improving the performance of Artificial Intelligence systems in…

Information Retrieval · Computer Science 2022-09-27 Bao-Sinh Nguyen , Dung Tien Le , Hieu M. Vu , Tuan Anh D. Nguyen , Minh-Tien Nguyen , Hung Le

In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 NaHyeon Park , Kunhee Kim , Hyunjung Shim

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Yinan Zhang , Eric Tzeng , Yilun Du , Dmitry Kislyuk

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Tommaso Biancalani , Sergey Levine

Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models. However, due to the lack of focus in feedback content, especially regarding the object type and quantity,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Xuexiang Niu , Jinping Tang , Lei Wang , Ge Zhu

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive…

Computation and Language · Computer Science 2024-02-26 Yuxuan Liu , Tianchi Yang , Shaohan Huang , Zihan Zhang , Haizhen Huang , Furu Wei , Weiwei Deng , Feng Sun , Qi Zhang

Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Owen Oertell , Jonathan D. Chang , Yiyi Zhang , Kianté Brantley , Wen Sun

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Preeti Lamba , Kiran Ravish , Ankita Kushwaha , Pawan Kumar

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Rinon Gal , Moab Arar , Yuval Atzmon , Amit H. Bermano , Gal Chechik , Daniel Cohen-Or

Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example,…

Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Mihir Prabhudesai , Anirudh Goyal , Deepak Pathak , Katerina Fragkiadaki

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function,…

Machine Learning · Statistics 2026-02-03 Yidong Ouyang , Liyan Xie , Hongyuan Zha , Guang Cheng

Recent advancements in diffusion models have introduced fast sampling methods that can effectively produce high-quality images in just one or a few denoising steps. Interestingly, when these are distilled from existing diffusion models,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-05 Rinon Gal , Or Lichter , Elad Richardson , Or Patashnik , Amit H. Bermano , Gal Chechik , Daniel Cohen-Or

Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We…

Computer Vision and Pattern Recognition · Computer Science 2022-07-27 Jingqun Tang , Wenming Qian , Luchuan Song , Xiena Dong , Lan Li , Xiang Bai

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Idan Schwartz , Vésteinn Snæbjarnarson , Hila Chefer , Ryan Cotterell , Serge Belongie , Lior Wolf , Sagie Benaim

Diffusion models have achieved remarkable success in text-to-image generation. However, their practical applications are hindered by the misalignment between generated images and corresponding text prompts. To tackle this issue,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Zijing Hu , Fengda Zhang , Long Chen , Kun Kuang , Jiahui Li , Kaifeng Gao , Jun Xiao , Xin Wang , Wenwu Zhu
‹ Prev 1 2 3 10 Next ›