English
Related papers

Related papers: InstructMix2Mix: Consistent Sparse-View Editing Th…

200 papers

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Hadi Alzayer , Yunzhi Zhang , Chen Geng , Jia-Bin Huang , Jiajun Wu

While diffusion models have demonstrated remarkable progress in 2D image generation and editing, extending these capabilities to 3D editing remains challenging, particularly in maintaining multi-view consistency. Classical approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Yufeng Chi , Huimin Ma , Kafeng Wang , Jianmin Li

This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Linzhan Mou , Jun-Kun Chen , Yu-Xiong Wang

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Zi-Xin Zou , Weihao Cheng , Yan-Pei Cao , Shi-Sheng Huang , Ying Shan , Song-Hai Zhang

While 2D diffusion models have achieved remarkable success in identity-preserving personalization, extending this capability to 3D assets remains a significant challenge due to the complexities of multi-view consistency and spatial control.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Jinxin Ai , Matthias Nießner , Ziya Erkoç

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Bosheng Qin , Juncheng Li , Siliang Tang , Tat-Seng Chua , Yueting Zhuang

While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Gihyun Kwon , Jangho Park , Jong Chul Ye

Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle to edit the layout of real images. Although a few works have been proposed to tackle this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Tao Xia , Yudi Zhang , Ting Liu Lei Zhang

With recent advances in Multimodal Large Language Models (MLLMs) showing strong visual understanding and reasoning, interest is growing in using them to improve the editing performance of diffusion models. Despite rapid progress, most…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Chong Mou , Qichao Sun , Yanze Wu , Pengze Zhang , Xinghui Li , Fulong Ye , Songtao Zhao , Qian He

The remarkable generative capabilities of diffusion models have motivated extensive research in both image and video editing. Compared to video editing which faces additional challenges in the time dimension, image editing has witnessed the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Wenqi Ouyang , Yi Dong , Lei Yang , Jianlou Si , Xingang Pan

Recent advances in diffusion transformers have shown remarkable generalization in visual synthesis, yet most dense perception methods still rely on text-to-image (T2I) generators designed for stochastic generation. We revisit this paradigm…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Yiqing Shi , Yiren Song , Mike Zheng Shou

This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yuxin Liu , Minshan Xie , Hanyuan Liu , Tien-Tsin Wong

Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Pufan Li , Bi'an Du , Shenghe Zheng , Junyi Yao , Wei Hu

Image generation and editing have seen a great deal of advancements with the rise of large-scale diffusion models that allow user control of different modalities such as text, mask, depth maps, etc. However, controlled editing of videos…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 AmirHossein Zamani , Amir G. Aghdam , Tiberiu Popa , Eugene Belilovsky

We present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Eyal Gomel , Lior Wolf

Diffusion models can synthesize realistic co-speech video from audio for various applications, such as video creation and virtual agents. However, existing diffusion-based methods are slow due to numerous denoising steps and costly…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Beijia Lu , Ziyi Chen , Jing Xiao , Jun-Yan Zhu

Recent advances in 3D representations, such as Neural Radiance Fields and 3D Gaussian Splatting, have greatly improved realistic scene modeling and novel-view synthesis. However, achieving controllable and consistent editing in dynamic 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Kai He , Chin-Hsuan Wu , Igor Gilitschenski

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Tim Brooks , Aleksander Holynski , Alexei A. Efros

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the…

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Zhizhuo Zhou , Shubham Tulsiani
‹ Prev 1 2 3 10 Next ›