Related papers: ClickDiffusion: Harnessing LLMs for Interactive Pr…

Blended Diffusion for Text-driven Editing of Natural Images

Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Omri Avrahami , Dani Lischinski , Ohad Fried

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pengzhi Li , QInxuan Huang , Yikang Ding , Zhiheng Li

Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions

Machine learning has enabled the development of powerful systems capable of editing images from natural language instructions. However, in many common scenarios it is difficult for users to specify precise image transformations with text…

Artificial Intelligence · Computer Science 2024-02-14 Alec Helbling , Seongmin Lee , Polo Chau

InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning

Instruction-based image editing has made a great process in using natural human language to manipulate the visual content of images. However, existing models are limited by the quality of the dataset and cannot accurately localize editing…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Tiancheng Li , Jinxiu Liu , Huajun Chen , Qi Liu

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to precisely edit the generated or real images. In this paper, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Chong Mou , Xintao Wang , Jiechong Song , Ying Shan , Jian Zhang

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Linoy Tsaban , Apolinário Passos

LDEdit: Towards Generalized Text Guided Image Manipulation via Latent Diffusion Models

Research in vision-language models has seen rapid developments off-late, enabling natural language-based interfaces for image generation and manipulation. Many existing text guided manipulation techniques are restricted to specific classes…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Paramanand Chandramouli , Kanchana Vaishnavi Gandikota

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Leigang Qu , Shengqiong Wu , Hao Fei , Liqiang Nie , Tat-Seng Chua

DiffUTE: Universal Text Editing Diffusion Model

Diffusion model based language-guided image editing has achieved great success recently. However, existing state-of-the-art diffusion models struggle with rendering correct text and text style during generation. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-10-19 Haoxing Chen , Zhuoer Xu , Zhangxuan Gu , Jun Lan , Xing Zheng , Yaohui Li , Changhua Meng , Huijia Zhu , Weiqiang Wang

Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models

Recent advances in diffusion models enable many powerful instruments for image editing. One of these instruments is text-driven image manipulations: editing semantic attributes of an image according to the provided text description. %…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Nikita Starodubcev , Dmitry Baranchuk , Valentin Khrulkov , Artem Babenko

Generating Illustrated Instructions

We introduce the new task of generating Illustrated Instructions, i.e., visual instructions customized to a user's needs. We identify desiderata unique to this task, and formalize it through a suite of automatic and human evaluation…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Sachit Menon , Ishan Misra , Rohit Girdhar

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

While language-guided image manipulation has made remarkable progress, the challenge of how to instruct the manipulation process faithfully reflecting human intentions persists. An accurate and comprehensive description of a manipulation…

Computer Vision and Pattern Recognition · Computer Science 2023-08-03 Yasheng Sun , Yifan Yang , Houwen Peng , Yifei Shen , Yuqing Yang , Han Hu , Lili Qiu , Hideki Koike

HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation

Text-driven person image generation is an emerging and challenging task in cross-modality image generation. Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.…

Computer Vision and Pattern Recognition · Computer Science 2022-11-14 Kaiduo Zhang , Muyi Sun , Jianxin Sun , Binghao Zhao , Kunbo Zhang , Zhenan Sun , Tieniu Tan

TexSliders: Diffusion-Based Texture Editing in CLIP Space

Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion…

Graphics · Computer Science 2024-05-02 Julia Guerrero-Viu , Milos Hasan , Arthur Roullier , Midhun Harikumar , Yiwei Hu , Paul Guerrero , Diego Gutierrez , Belen Masia , Valentin Deschaintre

DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images

Text-based semantic image editing assumes the manipulation of an image using a natural language instruction. Although recent works are capable of generating creative and qualitative images, the problem is still mostly approached as a black…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Maria Mihaela Trusca , Tinne Tuytelaars , Marie-Francine Moens

Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images

Text-to-image generative models have made remarkable advancements in generating high-quality images. However, generated images often contain undesirable artifacts or other errors due to model limitations. Existing techniques to fine-tune…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Peyman Gholami , Robert Xiao

Self-correcting LLM-controlled Diffusion Models

Text-to-image generation has witnessed significant progress with the advent of diffusion models. Despite the ability to generate photorealistic images, current text-to-image diffusion models still often struggle to accurately interpret and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Tsung-Han Wu , Long Lian , Joseph E. Gonzalez , Boyi Li , Trevor Darrell

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models

With the rise of large, publicly-available text-to-image diffusion models, text-guided real image editing has garnered much research attention recently. Existing methods tend to either rely on some form of per-instance or per-task…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Adham Elarabawy , Harish Kamath , Samuel Denton

LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps

The advancement of text-to-image synthesis has introduced powerful generative models capable of creating realistic images from textual prompts. However, precise control over image attributes remains challenging, especially at the instance…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Andrey Palaev , Adil Khan , Syed M. Ahsan Kazmi

TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing

Thanks to the powerful language comprehension capabilities of Large Language Models (LLMs), existing instruction-based image editing methods have introduced Multimodal Large Language Models (MLLMs) to promote information exchange between…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Yujie Hu , Zecheng Tang , Xu Jiang , Weiqi Li , Jian Zhang