Related papers: Towards Robust Sequential Decomposition for Comple…

ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

Text-driven image editing has achieved remarkable success in following single instructions. However, real-world scenarios often involve complex, multi-step instructions, particularly ``chain'' instructions where operations are…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Chenglin Wang , Yucheng Zhou , Qianning Wang , Zhe Wang , Kai Zhang

Instruction-based Image Editing with Planning, Reasoning, and Generation

Editing images via instruction provides a natural way to generate interactive content, but it is a big challenge due to the higher requirement of scene understanding and generation. Prior work utilizes a chain of large language models,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Liya Ji , Chenyang Qi , Qifeng Chen

Video4Edit: Viewing Image Editing as a Degenerate Temporal Process

We observe that recent advances in multimodal foundation models have propelled instruction-driven image generation and editing into a genuinely cross-modal, cooperative regime. Nevertheless, state-of-the-art editing pipelines remain costly:…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Xiaofan Li , Yanpeng Sun , Chenming Wu , Fan Duan , YuAn Wang , Weihao Bo , Yumeng Zhang , Dingkang Liang

Fine-tuning Large Language Models with Sequential Instructions

Despite the success of existing instruction-tuned models, we find that they usually struggle to respond to queries with multiple instructions. This impairs their performance in complex problems whose solution consists of multiple…

Computation and Language · Computer Science 2024-07-04 Hanxu Hu , Simon Yu , Pinzhen Chen , Edoardo M. Ponti

Improving Editability in Image Generation with Layer-wise Memory

Most real-world image editing tasks require multiple sequential edits to achieve desired results. Current editing approaches, primarily designed for single-object modifications, struggle with sequential editing: especially with maintaining…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Daneul Kim , Jaeah Lee , Jaesik Park

Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

Current text-driven image editing methods typically follow one of two directions: relying on large-scale, high-quality editing pair datasets to improve editing precision and diversity, or exploring alternative dataset-free techniques.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Chenrui Ma , Xi Xiao , Tianyang Wang , Yanning Shen

Learning to Follow Object-Centric Image Editing Instructions Faithfully

Natural language instructions are a powerful interface for editing the outputs of text-to-image diffusion models. However, several challenges need to be addressed: 1) underspecification (the need to model the implicit meaning of…

Computation and Language · Computer Science 2023-10-31 Tuhin Chakrabarty , Kanishk Singh , Arkadiy Saakyan , Smaranda Muresan

CompBench: Benchmarking Complex Instruction-guided Image Editing

While real-world applications increasingly demand intricate scene manipulation, existing instruction-guided image editing benchmarks often oversimplify task complexity and lack comprehensive, fine-grained instructions. To bridge this gap,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Bohan Jia , Wenxuan Huang , Yuntian Tang , Junbo Qiao , Jincheng Liao , Shaosheng Cao , Fei Zhao , Zhaopeng Feng , Zhouhong Gu , Zhenfei Yin , Lei Bai , Wanli Ouyang , Lin Chen , Fei Zhao , Yao Hu , Zihan Wang , Yuan Xie , Shaohui Lin

DreamOmni2: Multimodal Instruction-based Editing and Generation

Recent advancements in instruction-based image editing and subject-driven generation have garnered significant attention, yet both tasks still face limitations in meeting practical user needs. Instruction-based editing relies solely on…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Bin Xia , Bohao Peng , Yuechen Zhang , Junjia Huang , Jiyang Liu , Jingyao Li , Haoru Tan , Sitong Wu , Chengyao Wang , Yitong Wang , Xinglong Wu , Bei Yu , Jiaya Jia

$\texttt{Complex-Edit}$: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark

We introduce $\texttt{Complex-Edit}$, a comprehensive benchmark designed to systematically evaluate instruction-based image editing models across instructions of varying complexity. To develop this benchmark, we harness GPT-4o to…

Computer Vision and Pattern Recognition · Computer Science 2025-04-18 Siwei Yang , Mude Hui , Bingchen Zhao , Yuyin Zhou , Nataniel Ruiz , Cihang Xie

Learning to Model Editing Processes

Most existing sequence generation models produce outputs in one pass, usually left-to-right. However, this is in contrast with a more natural approach that humans use in generating content; iterative refinement and editing. Recent work has…

Computation and Language · Computer Science 2022-05-26 Machel Reid , Graham Neubig

Sequential Relational Decomposition

The concept of decomposition in computer science and engineering is considered a fundamental component of computational thinking and is prevalent in design of algorithms, software construction, hardware design, and more. We propose a simple…

Logic in Computer Science · Computer Science 2023-06-22 Dror Fried , Axel Legay , Joël Ouaknine , Moshe Y. Vardi

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing

Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they…

Computer Vision and Pattern Recognition · Computer Science 2024-06-03 Ying Jin , Pengyang Ling , Xiaoyi Dong , Pan Zhang , Jiaqi Wang , Dahua Lin

InsightEdit: Towards Better Instruction Following for Image Editing

In this paper, we focus on the task of instruction-based image editing. Previous works like InstructPix2Pix, InstructDiffusion, and SmartEdit have explored end-to-end editing. However, two limitations still remain: First, existing datasets…

Computer Vision and Pattern Recognition · Computer Science 2024-11-27 Yingjing Xu , Jie Kong , Jiazhi Wang , Xiao Pan , Bo Lin , Qiang Liu

Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking

Scribble-guided image editing allows users to combine simple scribble annotations with text prompts to specify both where and how an image should be edited, enabling flexible interaction with precise spatial control. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Mingyi Xu , Jinpeng Lin , Min Zhou , Tiezheng Ge , Ming Zeng

SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control

Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply each instruction in the prompt with a fixed…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Arman Zarei , Samyadeep Basu , Mobina Pournemat , Sayan Nag , Ryan Rossi , Soheil Feizi

Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches

Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Since sketches are difficult to collect, previous methods mainly use edge maps instead of sketches to…

Computer Vision and Pattern Recognition · Computer Science 2020-01-10 Shuai Yang , Zhangyang Wang , Jiaying Liu , Zongming Guo

InstructVEdit: A Holistic Approach for Instructional Video Editing

Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Chi Zhang , Chengjian Feng , Feng Yan , Qiming Zhang , Mingjin Zhang , Yujie Zhong , Jing Zhang , Lin Ma

Sequential Attention GAN for Interactive Image Editing

Most existing text-to-image synthesis tasks are static single-turn generation, based on pre-defined textual descriptions of images. To explore more practical and interactive real-life applications, we introduce a new task - Interactive…

Computer Vision and Pattern Recognition · Computer Science 2020-08-07 Yu Cheng , Zhe Gan , Yitong Li , Jingjing Liu , Jianfeng Gao

Successive Prompting for Decomposing Complex Questions

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question…

Computation and Language · Computer Science 2022-12-09 Dheeru Dua , Shivanshu Gupta , Sameer Singh , Matt Gardner