Related papers: LoopAnimate: Loopable Salient Object Animation

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Xiang Wang , Shiwei Zhang , Changxin Gao , Jiayu Wang , Xiaoqiang Zhou , Yingya Zhang , Luxin Yan , Nong Sang

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Zhongcong Xu , Jianfeng Zhang , Jun Hao Liew , Hanshu Yan , Jia-Wei Liu , Chenxu Zhang , Jiashi Feng , Mike Zheng Shou

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

We propose EverAnimate, an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Wuyang Li , Yang Gao , Mariam Hassan , Lan Feng , Wentao Pan , Po-Chien Luan , Alexandre Alahi

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

With the impressive progress in diffusion-based text-to-image generation, extending such powerful generative ability to text-to-video raises enormous attention. Existing methods either require large-scale text-video pairs and a large number…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Ruiqi Wu , Liangyu Chen , Tong Yang , Chunle Guo , Chongyi Li , Xiangyu Zhang

Mobius: Text to Seamless Looping Video Generation via Latent Shift

We present Mobius, a novel method to generate seamlessly looping videos from text descriptions directly without any user annotations, thereby creating new visual materials for the multi-media presentation. Our method repurposes the…

Computer Vision and Pattern Recognition · Computer Science 2025-02-28 Xiuli Bi , Jianfei Yuan , Bo Liu , Yong Zhang , Xiaodong Cun , Chi-Man Pun , Bin Xiao

LayerAnimate: Layer-level Control for Animation

Traditional animation production decomposes visual elements into discrete layers to enable independent processing for sketching, refining, coloring, and in-betweening. Existing anime generation video methods typically treat animation as a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yuxue Yang , Lue Fan , Zuzeng Lin , Feng Wang , Zhaoxiang Zhang

Lumiere: A Space-Time Diffusion Model for Video Generation

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Omer Bar-Tal , Hila Chefer , Omer Tov , Charles Herrmann , Roni Paiss , Shiran Zada , Ariel Ephrat , Junhwa Hur , Guanghui Liu , Amit Raj , Yuanzhen Li , Michael Rubinstein , Tomer Michaeli , Oliver Wang , Deqing Sun , Tali Dekel , Inbar Mosseri

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Andreas Blattmann , Robin Rombach , Huan Ling , Tim Dockhorn , Seung Wook Kim , Sanja Fidler , Karsten Kreis

MoVideo: Motion-Aware Video Generation with Diffusion Models

While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Jingyun Liang , Yuchen Fan , Kai Zhang , Radu Timofte , Luc Van Gool , Rakesh Ranjan

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural…

Computer Vision and Pattern Recognition · Computer Science 2025-04-21 Yaosi Hu , Zhenzhong Chen , Chong Luo

Controllable Longer Image Animation with Diffusion Models

Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Qiang Wang , Minghua Liu , Junjun Hu , Fan Jiang , Mu Xu

Generating Long Videos of Dynamic Scenes

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time…

Computer Vision and Pattern Recognition · Computer Science 2022-06-10 Tim Brooks , Janne Hellsten , Miika Aittala , Ting-Chun Wang , Timo Aila , Jaakko Lehtinen , Ming-Yu Liu , Alexei A. Efros , Tero Karras

EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Jiaqi Xu , Kunzhe Huang , Xinyi Zou , Yunkuo Chen , Bo Liu , MengLi Cheng , Jun Huang , Xing Shi

VideoMerge: Towards Training-free Long Video Generation

Long video generation remains a challenging and compelling topic in computer vision. Diffusion based models, among the various approaches to video generation, have achieved state of the art quality with their iterative denoising procedures.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Siyang Zhang , Harry Yang , Ser-Nam Lim

EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Conditional human animation traditionally animates static reference images using pose-based motion cues extracted from video data. However, these video-derived cues often suffer from low temporal resolution, motion blur, and unreliable…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Qiang Qu , Ming Li , Xiaoming Chen , Tongliang Liu

LatentMan: Generating Consistent Animated Characters using Image Diffusion Models

We propose a zero-shot approach for generating consistent videos of animated characters based on Text-to-Image (T2I) diffusion models. Existing Text-to-Video (T2V) methods are expensive to train and require large-scale video datasets to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Abdelrahman Eldesokey , Peter Wonka

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Jianwen Jiang , Chao Liang , Jiaqi Yang , Gaojie Lin , Tianyun Zhong , Yanbo Zheng

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively…

Computer Vision and Pattern Recognition · Computer Science 2024-02-23 Willi Menapace , Aliaksandr Siarohin , Ivan Skorokhodov , Ekaterina Deyneka , Tsai-Shien Chen , Anil Kag , Yuwei Fang , Aleksei Stoliar , Elisa Ricci , Jian Ren , Sergey Tulyakov

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation

Text-to-video diffusion models are notoriously limited in their ability to model temporal aspects such as motion, physics, and dynamic interactions. Existing approaches address this limitation by retraining the model or introducing external…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Ariel Shaulov , Itay Hazan , Lior Wolf , Hila Chefer