English
Related papers

Related papers: Towards Robust Sequential Decomposition for Comple…

200 papers

Text-driven image editing has achieved remarkable success in following single instructions. However, real-world scenarios often involve complex, multi-step instructions, particularly ``chain'' instructions where operations are…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Chenglin Wang , Yucheng Zhou , Qianning Wang , Zhe Wang , Kai Zhang

Editing images via instruction provides a natural way to generate interactive content, but it is a big challenge due to the higher requirement of scene understanding and generation. Prior work utilizes a chain of large language models,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Liya Ji , Chenyang Qi , Qifeng Chen

We observe that recent advances in multimodal foundation models have propelled instruction-driven image generation and editing into a genuinely cross-modal, cooperative regime. Nevertheless, state-of-the-art editing pipelines remain costly:…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Xiaofan Li , Yanpeng Sun , Chenming Wu , Fan Duan , YuAn Wang , Weihao Bo , Yumeng Zhang , Dingkang Liang

Despite the success of existing instruction-tuned models, we find that they usually struggle to respond to queries with multiple instructions. This impairs their performance in complex problems whose solution consists of multiple…

Computation and Language · Computer Science 2024-07-04 Hanxu Hu , Simon Yu , Pinzhen Chen , Edoardo M. Ponti

Most real-world image editing tasks require multiple sequential edits to achieve desired results. Current editing approaches, primarily designed for single-object modifications, struggle with sequential editing: especially with maintaining…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Daneul Kim , Jaeah Lee , Jaesik Park

Current text-driven image editing methods typically follow one of two directions: relying on large-scale, high-quality editing pair datasets to improve editing precision and diversity, or exploring alternative dataset-free techniques.…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Chenrui Ma , Xi Xiao , Tianyang Wang , Yanning Shen

Natural language instructions are a powerful interface for editing the outputs of text-to-image diffusion models. However, several challenges need to be addressed: 1) underspecification (the need to model the implicit meaning of…

Computation and Language · Computer Science 2023-10-31 Tuhin Chakrabarty , Kanishk Singh , Arkadiy Saakyan , Smaranda Muresan

While real-world applications increasingly demand intricate scene manipulation, existing instruction-guided image editing benchmarks often oversimplify task complexity and lack comprehensive, fine-grained instructions. To bridge this gap,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Bohan Jia , Wenxuan Huang , Yuntian Tang , Junbo Qiao , Jincheng Liao , Shaosheng Cao , Fei Zhao , Zhaopeng Feng , Zhouhong Gu , Zhenfei Yin , Lei Bai , Wanli Ouyang , Lin Chen , Fei Zhao , Yao Hu , Zihan Wang , Yuan Xie , Shaohui Lin

Recent advancements in instruction-based image editing and subject-driven generation have garnered significant attention, yet both tasks still face limitations in meeting practical user needs. Instruction-based editing relies solely on…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Bin Xia , Bohao Peng , Yuechen Zhang , Junjia Huang , Jiyang Liu , Jingyao Li , Haoru Tan , Sitong Wu , Chengyao Wang , Yitong Wang , Xinglong Wu , Bei Yu , Jiaya Jia

We introduce $\texttt{Complex-Edit}$, a comprehensive benchmark designed to systematically evaluate instruction-based image editing models across instructions of varying complexity. To develop this benchmark, we harness GPT-4o to…

Computer Vision and Pattern Recognition · Computer Science 2025-04-18 Siwei Yang , Mude Hui , Bingchen Zhao , Yuyin Zhou , Nataniel Ruiz , Cihang Xie

Most existing sequence generation models produce outputs in one pass, usually left-to-right. However, this is in contrast with a more natural approach that humans use in generating content; iterative refinement and editing. Recent work has…

Computation and Language · Computer Science 2022-05-26 Machel Reid , Graham Neubig

The concept of decomposition in computer science and engineering is considered a fundamental component of computational thinking and is prevalent in design of algorithms, software construction, hardware design, and more. We propose a simple…

Logic in Computer Science · Computer Science 2023-06-22 Dror Fried , Axel Legay , Joël Ouaknine , Moshe Y. Vardi

Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they…

Computer Vision and Pattern Recognition · Computer Science 2024-06-03 Ying Jin , Pengyang Ling , Xiaoyi Dong , Pan Zhang , Jiaqi Wang , Dahua Lin

In this paper, we focus on the task of instruction-based image editing. Previous works like InstructPix2Pix, InstructDiffusion, and SmartEdit have explored end-to-end editing. However, two limitations still remain: First, existing datasets…

Computer Vision and Pattern Recognition · Computer Science 2024-11-27 Yingjing Xu , Jie Kong , Jiazhi Wang , Xiao Pan , Bo Lin , Qiang Liu

Scribble-guided image editing allows users to combine simple scribble annotations with text prompts to specify both where and how an image should be edited, enabling flexible interaction with precise spatial control. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Mingyi Xu , Jinpeng Lin , Min Zhou , Tiezheng Ge , Ming Zeng

Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply each instruction in the prompt with a fixed…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Arman Zarei , Samyadeep Basu , Mobina Pournemat , Sayan Nag , Ryan Rossi , Soheil Feizi

Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Since sketches are difficult to collect, previous methods mainly use edge maps instead of sketches to…

Computer Vision and Pattern Recognition · Computer Science 2020-01-10 Shuai Yang , Zhangyang Wang , Jiaying Liu , Zongming Guo

Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Chi Zhang , Chengjian Feng , Feng Yan , Qiming Zhang , Mingjin Zhang , Yujie Zhong , Jing Zhang , Lin Ma

Most existing text-to-image synthesis tasks are static single-turn generation, based on pre-defined textual descriptions of images. To explore more practical and interactive real-life applications, we introduce a new task - Interactive…

Computer Vision and Pattern Recognition · Computer Science 2020-08-07 Yu Cheng , Zhe Gan , Yitong Li , Jingjing Liu , Jianfeng Gao

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question…

Computation and Language · Computer Science 2022-12-09 Dheeru Dua , Shivanshu Gupta , Sameer Singh , Matt Gardner
‹ Prev 1 2 3 10 Next ›