English
Related papers

Related papers: MiLDEdit: Reasoning-Based Multi-Layer Design Docum…

200 papers

Text rendering has recently emerged as one of the most challenging frontiers in visual generation, drawing significant attention from large-scale diffusion and multimodal models. However, text editing within images remains largely…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Rui Gui , Yang Wan , Haochen Han , Dongxing Mao , Fangming Liu , Min Li , Alex Jinpeng Wang

Large Multi-modality Models (LMMs) have made significant progress in visual understanding and generation, but they still face challenges in General Visual Editing, particularly in following complex instructions, preserving appearance…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Xiangyu Zhao , Peiyuan Zhang , Kexian Tang , Xiaorong Zhu , Hao Li , Wenhao Chai , Zicheng Zhang , Renqiu Xia , Guangtao Zhai , Junchi Yan , Hua Yang , Xue Yang , Haodong Duan

With the rapid advancement of commercial multi-modal models, image editing has garnered significant attention due to its widespread applicability in daily life. Despite impressive progress, existing image editing systems, particularly…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yiran Zhao , Yaoqi Ye , Xiang Liu , Michael Qizhe Shieh , Trung Bui

Existing image editing methods can handle simple editing instructions very well. To deal with complex editing instructions, they often need to jointly fine-tune the large language models (LLMs) and diffusion models (DMs), which involves…

Computer Vision and Pattern Recognition · Computer Science 2025-11-03 Yijia Wang , Yiqing Shen , Weiming Chen , Zhihai He

Spreadsheets are central to real-world applications such as enterprise reporting, auditing, and scientific data management. Despite their ubiquity, existing large language model based approaches typically treat tables as plain text,…

Computation and Language · Computer Science 2026-04-15 Houxing Ren , Mingjie Zhan , Zimu Lu , Ke Wang , Yunqiao Yang , Haotian Hou , Hongsheng Li

Despite the remarkable capabilities of text-to-image (T2I) generation models, real-world applications often demand fine-grained, iterative image editing that existing methods struggle to provide. Key challenges include granular instruction…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Zihan Liang , Jiahao Sun , Haoran Ma

Recent advances in image editing models have shown remarkable progress. A common architectural design couples a multimodal large language model (MLLM) encoder with a diffusion decoder, as seen in systems such as Step1X-Edit and…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Fukun Yin , Shiyu Liu , Yucheng Han , Zhibo Wang , Peng Xing , Rui Wang , Wei Cheng , Yingming Wang , Aojie Li , Zixin Yin , Pengtao Chen , Xiangyu Zhang , Daxin Jiang , Xianfang Zeng , Gang Yu

Recent advances in multi-modal generative models have driven substantial improvements in image editing. However, current generative models still struggle with handling diverse and complex image editing tasks that require implicit reasoning,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Feng Han , Yibin Wang , Chenglin Li , Zheming Liang , Dianyi Wang , Yang Jiao , Zhipeng Wei , Chao Gong , Cheng Jin , Jingjing Chen , Jiaqi Wang

Multimodal generative models have made significant strides in image editing, demonstrating impressive performance on a variety of static tasks. However, their proficiency typically does not extend to complex scenarios requiring dynamic…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Zhiqiang Sheng , Xumeng Han , Zhiwei Zhang , Zenghui Xiong , Yifan Ding , Aoxiang Ping , Xiang Li , Tong Guo , Yao Mao

Recent advances in AI-generated content (AIGC) have significantly accelerated image editing techniques, driving increasing demand for diverse and fine-grained edits. Despite these advances, existing image editing methods still face…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Shuyu Wang , Weiqi Li , Qian Wang , Shijie Zhao , Jian Zhang

Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors tackle reasoning-heavy tasks, which typically…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Jiaxing Qiu , Kaihua Hou , Roxana Daneshjou , Ahmed Alaa , Thomas Hartvigsen

Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering --…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Junhong Shen , Mu Cai , Bo Hu , Ameet Talwalkar , David A Ross , Cordelia Schmid , Alireza Fathi

Natural language processing evaluation has made significant progress, largely driven by the proliferation of powerful large language mod-els (LLMs). New evaluation benchmarks are of increasing priority as the reasoning capabilities of LLMs…

Computation and Language · Computer Science 2025-06-19 Joseph J. Peper , Wenzhao Qiu , Ali Payani , Lu Wang

Structured images (e.g., charts and geometric diagrams) remain challenging for multimodal large language models (MLLMs), as perceptual slips can cascade into erroneous conclusions. Intermediate visual cues can steer reasoning; however,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Shuoshuo Zhang , Zijian Li , Yizhen Zhang , Jingjing Fu , Lei Song , Jiang Bian , Jun Zhang , Yujiu Yang , Rui Wang

In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplored research…

Recently, how to achieve precise image editing has attracted increasing attention, especially given the remarkable success of text-to-image generation models. To unify various spatial-aware image editing abilities into one framework, we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Yueru Jia , Yuhui Yuan , Aosong Cheng , Chuke Wang , Ji Li , Huizhu Jia , Shanghang Zhang

We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global…

Computation and Language · Computer Science 2025-08-06 Ishani Mondal , Meera Bharadwaj , Ayush Roy , Aparna Garimella , Jordan Lee Boyd-Graber

Current instruction-based editing methods, such as InstructPix2Pix, often fail to produce satisfactory results in complex scenarios due to their dependence on the simple CLIP text encoder in diffusion models. To rectify this, this paper…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Yuzhou Huang , Liangbin Xie , Xintao Wang , Ziyang Yuan , Xiaodong Cun , Yixiao Ge , Jiantao Zhou , Chao Dong , Rui Huang , Ruimao Zhang , Ying Shan

Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using Large Multimodal Models (LMMs), yet…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Jiazhe Wei , Ken Li , Tianyu Lao , Haofan Wang , Liang Wang , Caifeng Shan , Chenyang Si

Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Guanzhou Chen , Erfei Cui , Changyao Tian , Danni Yang , Ganlin Yang , Yu Qiao , Hongsheng Li , Gen Luo , Hongjie Zhang
‹ Prev 1 2 3 10 Next ›