Related papers: Optimisation-Based Multi-Modal Semantic Image Edit…

Interactive Image Manipulation with Complex Text Instructions

Recently, text-guided image manipulation has received increasing attention in the research field of multimedia processing and computer vision due to its high flexibility and controllability. Its goal is to semantically manipulate parts of…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Ryugo Morita , Zhiqiang Zhang , Man M. Ho , Jinjia Zhou

Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking

Scribble-guided image editing allows users to combine simple scribble annotations with text prompts to specify both where and how an image should be edited, enabling flexible interaction with precise spatial control. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Mingyi Xu , Jinpeng Lin , Min Zhou , Tiezheng Ge , Ming Zeng

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Yuanyuan Chang , Yinghua Yao , Tao Qin , Mengmeng Wang , Ivor Tsang , Guang Dai

Prompt Augmentation for Self-supervised Text-guided Image Manipulation

Text-guided image editing finds applications in various creative and practical fields. While recent studies in image generation have advanced the field, they often struggle with the dual challenges of coherent image transformation and…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Rumeysa Bodur , Binod Bhattarai , Tae-Kyun Kim

Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models

Recent advances in diffusion models enable many powerful instruments for image editing. One of these instruments is text-driven image manipulations: editing semantic attributes of an image according to the provided text description. %…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Nikita Starodubcev , Dmitry Baranchuk , Valentin Khrulkov , Artem Babenko

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

With the great success of text-conditioned diffusion models in creative text-to-image generation, various text-driven image editing approaches have attracted the attentions of many researchers. However, previous works mainly focus on…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhiyuan Ma , Guoli Jia , Bowen Zhou

S$^2$Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control

Recent advances in diffusion models have enabled high-quality generation and manipulation of images guided by texts, as well as concept learning from images. However, naive applications of existing methods to editing tasks that require…

Computer Vision and Pattern Recognition · Computer Science 2025-12-29 Xudong Liu , Zikun Chen , Ruowei Jiang , Ziyi Wu , Kejia Yin , Han Zhao , Parham Aarabi , Igor Gilitschenski

ControlEdit: A MultiModal Local Clothing Image Editing Method

Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions, which effectively improves the work efficiency of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Di Cheng , YingJie Shi , ShiXin Sun , JiaFu Zhang , WeiJing Wang , Yu Liu

Learning to Follow Object-Centric Image Editing Instructions Faithfully

Natural language instructions are a powerful interface for editing the outputs of text-to-image diffusion models. However, several challenges need to be addressed: 1) underspecification (the need to model the implicit meaning of…

Computation and Language · Computer Science 2023-10-31 Tuhin Chakrabarty , Kanishk Singh , Arkadiy Saakyan , Smaranda Muresan

Action-based image editing guided by human instructions

Text-based image editing is typically approached as a static task that involves operations such as inserting, deleting, or modifying elements of an input image based on human instructions. Given the static nature of this task, in this…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Maria Mihaela Trusca , Mingxiao Li , Marie-Francine Moens

Text as Neural Operator: Image Manipulation by Text Instruction

In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community. The input to conditional image generation has evolved from image-only to multimodality. In this paper, we study…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Tianhao Zhang , Hung-Yu Tseng , Lu Jiang , Weilong Yang , Honglak Lee , Irfan Essa

DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

Current text-driven image editing methods typically follow one of two directions: relying on large-scale, high-quality editing pair datasets to improve editing precision and diversity, or exploring alternative dataset-free techniques.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Chenrui Ma , Xi Xiao , Tianyang Wang , Yanning Shen

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user…

Machine Learning · Statistics 2017-04-18 Ardavan Saeedi , Matthew D. Hoffman , Stephen J. DiVerdi , Asma Ghandeharioun , Matthew J. Johnson , Ryan P. Adams

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2023-11-17 Shelly Sheynin , Adam Polyak , Uriel Singer , Yuval Kirstain , Amit Zohar , Oron Ashual , Devi Parikh , Yaniv Taigman

DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images

Text-based semantic image editing assumes the manipulation of an image using a natural language instruction. Although recent works are capable of generating creative and qualitative images, the problem is still mostly approached as a black…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Maria Mihaela Trusca , Tinne Tuytelaars , Marie-Francine Moens

Image Editing with Diffusion Models: A Survey

With deeper exploration of diffusion model, developments in the field of image generation have triggered a boom in image creation. As the quality of base-model generated images continues to improve, so does the demand for further…

Graphics · Computer Science 2025-04-21 Jia Wang , Jie Hu , Xiaoqi Ma , Hanghang Ma , Xiaoming Wei , Enhua Wu

SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI feedback

This paper presents SPIE: a novel approach for semantic and structural post-training of instruction-based image editing diffusion models, addressing key challenges in alignment with user prompts and consistency with input images. We…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Elior Benarous , Yilun Du , Heng Yang

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Senmao Li , Joost van de Weijer , Taihang Hu , Fahad Shahbaz Khan , Qibin Hou , Yaxing Wang , Jian Yang , Ming-Ming Cheng

Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training

Instruction-guided image editing consists in taking an image and an instruction and deliverring that image altered according to that instruction. State-of-the-art approaches to this task suffer from the typical scaling up and domain…

Computation and Language · Computer Science 2025-03-05 Rodrigo Santos , António Branco , João Silva , João Rodrigues