Related papers: InstructMix2Mix: Consistent Sparse-View Editing Th…

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Hadi Alzayer , Yunzhi Zhang , Chen Geng , Jia-Bin Huang , Jiajun Wu

DisCo3D: Distilling Multi-View Consistency for 3D Scene Editing

While diffusion models have demonstrated remarkable progress in 2D image generation and editing, extending these capabilities to 3D editing remains challenging, particularly in maintaining multi-view consistency. Classical approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Yufeng Chi , Huimin Ma , Kafeng Wang , Jianmin Li

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Linzhan Mou , Jun-Kun Chen , Yu-Xiong Wang

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Zi-Xin Zou , Weihao Cheng , Yan-Pei Cao , Shi-Sheng Huang , Ying Shan , Song-Hai Zhang

DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing

While 2D diffusion models have achieved remarkable success in identity-preserving personalization, extending this capability to 3D assets remains a significant challenge due to the complexities of multi-view consistency and spatial control.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Jinxin Ai , Matthias Nießner , Ziya Erkoç

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Bosheng Qin , Juncheng Li , Siliang Tang , Tat-Seng Chua , Yueting Zhuang

Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Gihyun Kwon , Jangho Park , Jong Chul Ye

Consistent Image Layout Editing with Diffusion Models

Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle to edit the layout of real images. Although a few works have been proposed to tackle this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Tao Xia , Yudi Zhang , Ting Liu Lei Zhang

InstructX: Towards Unified Visual Editing with MLLM Guidance

With recent advances in Multimodal Large Language Models (MLLMs) showing strong visual understanding and reasoning, interest is growing in using them to improve the editing performance of diffusion models. Despite rapid progress, most…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Chong Mou , Qichao Sun , Yanze Wu , Pengze Zhang , Xinghui Li , Fulong Ye , Songtao Zhao , Qian He

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

The remarkable generative capabilities of diffusion models have motivated extensive research in both image and video editing. Compared to video editing which faces additional challenges in the time dimension, image editing has witnessed the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Wenqi Ouyang , Yi Dong , Lei Yang , Jianlou Si , Xingang Pan

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

Recent advances in diffusion transformers have shown remarkable generalization in visual synthesis, yet most dense perception methods still rely on text-to-image (T2I) generators designed for stochastic generation. We revisit this paradigm…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Yiqing Shi , Yiren Song , Mike Zheng Shou

Text-Guided Texturing by Synchronized Multi-View Diffusion

This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yuxin Liu , Minshan Xie , Hanyuan Liu , Tien-Tsin Wong

View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Pufan Li , Bi'an Du , Shenghe Zheng , Junyi Yao , Wei Hu

Temporally Consistent Object Editing in Videos using Extended Attention

Image generation and editing have seen a great deal of advancements with the rise of large-scale diffusion models that allow user control of different modalities such as text, mask, depth maps, etc. However, controlled editing of videos…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 AmirHossein Zamani , Amir G. Aghdam , Tiberiu Popa , Eugene Belilovsky

Diffusion-Based Attention Warping for Consistent 3D Scene Editing

We present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Eyal Gomel , Lior Wolf

Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation

Diffusion models can synthesize realistic co-speech video from audio for various applications, such as video creation and virtual agents. However, existing diffusion-based methods are slow due to numerous denoising steps and costly…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Beijia Lu , Ziyi Chen , Jing Xiao , Jun-Yan Zhu

CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

Recent advances in 3D representations, such as Neural Radiance Fields and 3D Gaussian Splatting, have greatly improved realistic scene modeling and novel-view synthesis. However, achieving controllable and consistent editing in dynamic 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Kai He , Chin-Hsuan Wu , Igor Gilitschenski

InstructPix2Pix: Learning to Follow Image Editing Instructions

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Tim Brooks , Aleksander Holynski , Alexei A. Efros

LUSD: Localized Update Score Distillation for Text-Guided Image Editing

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the…

Graphics · Computer Science 2025-07-03 Worameth Chinchuthakun , Tossaporn Saengja , Nontawat Tritrong , Pitchaporn Rewatbowornwong , Pramook Khungurn , Supasorn Suwajanakorn

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Zhizhuo Zhou , Shubham Tulsiani