Related papers: Object-Centric Diffusion for Efficient Video Editi…

Temporally Consistent Object Editing in Videos using Extended Attention

Image generation and editing have seen a great deal of advancements with the rise of large-scale diffusion models that allow user control of different modalities such as text, mask, depth maps, etc. However, controlled editing of videos…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 AmirHossein Zamani , Amir G. Aghdam , Tiberiu Popa , Eugene Belilovsky

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Michal Geyer , Omer Bar-Tal , Shai Bagon , Tali Dekel

DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing

We present a diffusion-based video editing framework, namely DiffusionAtlas, which can achieve both frame consistency and high fidelity in editing video object appearance. Despite the success in image editing, diffusion models still…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Shao-Yu Chang , Hwann-Tzong Chen , Tyng-Luh Liu

Diffusion-Based Attention Warping for Consistent 3D Scene Editing

We present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Eyal Gomel , Lior Wolf

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Senmao Li , Joost van de Weijer , Taihang Hu , Fahad Shahbaz Khan , Qibin Hou , Yaxing Wang , Jian Yang , Ming-Ming Cheng

Blended Latent Diffusion under Attention Control for Real-World Video Editing

Due to lack of fully publicly available text-to-video models, current video editing methods tend to build on pre-trained text-to-image generation models, however, they still face grand challenges in dealing with the local editing of video…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Deyin Liu , Lin Yuanbo Wu , Xianghua Xie

InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing

Large text-to-image diffusion models have achieved remarkable success in generating diverse, high-quality images. Additionally, these models have been successfully leveraged to edit input images by just changing the text prompt. But when…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Anant Khandelwal

Streaming Video Diffusion: Online Video Editing with Diffusion Models

We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Feng Chen , Zhen Yang , Bohan Zhuang , Qi Wu

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Wenhao Chai , Xun Guo , Gaoang Wang , Yan Lu

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models

Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets. Current video editing methods commonly require per-video fine-tuning of…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Zhenghao Zhang , Zuozhuo Dai , Long Qin , Weizhi Wang

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

Diffusion models have demonstrated remarkable capabilities in text-to-image and text-to-video generation, opening up possibilities for video editing based on textual input. However, the computational cost associated with sequential sampling…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Youyuan Zhang , Xuan Ju , James J. Clark

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Text-guided diffusion models have revolutionized image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Xuan Ju , Ailing Zeng , Yuxuan Bian , Shaoteng Liu , Qiang Xu

Learning Object-Centric Representations Based on Slots in Real World Scenarios

A central goal in AI is to represent scenes as compositions of discrete objects, enabling fine-grained, controllable image and video generation. Yet leading diffusion models treat images holistically and rely on text conditioning, creating…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Adil Kaan Akan

EraserDiT: Fast Video Inpainting with Diffusion Transformer Model

Video object removal and inpainting are critical tasks in the fields of computer vision and multimedia processing, aimed at restoring missing or corrupted regions in video sequences. Traditional methods predominantly rely on flow-based…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Jie Liu , Zheng Hui

VidToMe: Video Token Merging for Zero-Shot Video Editing

Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Xirui Li , Chao Ma , Xiaokang Yang , Ming-Hsuan Yang

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg

Consistent Image Layout Editing with Diffusion Models

Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle to edit the layout of real images. Although a few works have been proposed to tackle this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Tao Xia , Yudi Zhang , Ting Liu Lei Zhang

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-03 Eyal Molad , Eliahu Horwitz , Dani Valevski , Alex Rav Acha , Yossi Matias , Yael Pritch , Yaniv Leviathan , Yedid Hoshen

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pengzhi Li , QInxuan Huang , Yikang Ding , Zhiheng Li

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Text-to-video editing aims to edit the visual appearance of a source video conditional on textual prompts. A major challenge in this task is to ensure that all frames in the edited video are visually consistent. Most recent works apply…

Computer Vision and Pattern Recognition · Computer Science 2024-03-04 Yuren Cong , Mengmeng Xu , Christian Simon , Shoufa Chen , Jiawei Ren , Yanping Xie , Juan-Manuel Perez-Rua , Bodo Rosenhahn , Tao Xiang , Sen He