Related papers: Mobile Video Diffusion

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Video diffusion models have recently made great progress in generation quality, but are still limited by the high memory and computational requirements. This is because current video diffusion models often attempt to process…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Sihyun Yu , Weili Nie , De-An Huang , Boyi Li , Jinwoo Shin , Anima Anandkumar

MoViE: Mobile Diffusion for Video Editing

Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Adil Karjauv , Noor Fathima , Ioannis Lelekas , Fatih Porikli , Amir Ghodrati , Amirhossein Habibian

Video Probabilistic Diffusion Models in Projected Latent Space

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sihyun Yu , Kihyuk Sohn , Subin Kim , Jinwoo Shin

Taming Diffusion Transformer for Efficient Mobile Video Generation in Seconds

Diffusion Transformers (DiT) have shown strong performance in video generation tasks, but their high computational cost makes them impractical for resource-constrained devices like smartphones, and practical on-device generation is even…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Yushu Wu , Yanyu Li , Anil Kag , Ivan Skorokhodov , Willi Menapace , Ke Ma , Arpit Sahni , Ju Hu , Aliaksandr Siarohin , Dhritiman Sagar , Yanzhi Wang , Sergey Tulyakov

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the substantial computational complexity and slow generation speed of diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Shuai Zhang , Bao Tang , Siyuan Yu , Yueting Zhu , Jingfeng Yao , Ya Zou , Shanglin Yuan , Li Yu , Wenyu Liu , Xinggang Wang

GVD: Guiding Video Diffusion Model for Scalable Video Distillation

To address the larger computation and storage requirements associated with large video datasets, video dataset distillation aims to capture spatial and temporal information in a significantly smaller dataset, such that training on the…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Kunyang Li , Jeffrey A Chan Santiago , Sarinda Dhanesh Samarasinghe , Gaowen Liu , Mubarak Shah

MagicVideo: Efficient Video Generation With Latent Diffusion Models

We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. MagicVideo can generate smooth video clips that are concordant with the given text descriptions. Due to a novel and efficient 3D…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Daquan Zhou , Weimin Wang , Hanshu Yan , Weiwei Lv , Yizhe Zhu , Jiashi Feng

MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed. In this paper, we propose \textbf{MobileDiffusion}, a highly efficient text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Yang Zhao , Yanwu Xu , Zhisheng Xiao , Haolin Jia , Tingbo Hou

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-15 Xi Yang , Chenhang He , Jianqi Ma , Lei Zhang

OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution

Recently, latent diffusion models has demonstrated promising performance in real-world video super-resolution (VSR) task, which can reconstruct high-quality videos from distorted low-resolution input through multiple diffusion steps.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Hanting Li , Huaao Tang , Jianhong Han , Tianxiong Zhou , Jiulong Cui , Haizhen Xie , Yan Chen , Jie Hu

Dynamic Video Generation: Shaping Video Generation Across Time and Space

Diffusion models have achieved impressive performance in video generation, but their iterative denoising process remains computationally expensive due to the large number of tokens processed at each timestep. Recently, progressive…

Computer Vision and Pattern Recognition · Computer Science 2026-05-21 Shikang Zheng , Jingkai Huang , Jiacheng Liu , Guantao Chen , Lixuan , Yuqi Lin , Peiliang Cai , Linfeng Zhang

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Andreas Blattmann , Tim Dockhorn , Sumith Kulal , Daniel Mendelevitch , Maciej Kilian , Dominik Lorenz , Yam Levi , Zion English , Vikram Voleti , Adam Letts , Varun Jampani , Robin Rombach

Streaming Video Diffusion: Online Video Editing with Diffusion Models

We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Feng Chen , Zhen Yang , Bohan Zhuang , Qi Wu

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation

We introduce the Joint Video-Image Diffusion model (JVID), a novel approach to generating high-quality and temporally coherent videos. We achieve this by integrating two diffusion models: a Latent Image Diffusion Model (LIDM) trained on…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Hadrien Reynaud , Matthew Baugh , Mischa Dombrowski , Sarah Cechnicka , Qingjie Meng , Bernhard Kainz

Efficient Video Diffusion Models: Advancements and Challenges

Video diffusion models have rapidly become the dominant paradigm for high-fidelity generative video synthesis, but their practical deployment remains constrained by severe inference costs. Compared with image generation, video synthesis…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Shitong Shao , Lichen Bai , Pengfei Wan , James Kwok , Zeke Xie

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1)…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Jintao Zhang , Kaiwen Zheng , Kai Jiang , Haoxu Wang , Ion Stoica , Joseph E. Gonzalez , Jianfei Chen , Jun Zhu

SF-V: Single Forward Video Generation Model

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Zhixing Zhang , Yanyu Li , Yushu Wu , Yanwu Xu , Anil Kag , Ivan Skorokhodov , Willi Menapace , Aliaksandr Siarohin , Junli Cao , Dimitris Metaxas , Sergey Tulyakov , Jian Ren

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Yushu Wu , Zhixing Zhang , Yanyu Li , Yanwu Xu , Anil Kag , Yang Sui , Huseyin Coskun , Ke Ma , Aleksei Lebedev , Ju Hu , Dimitris Metaxas , Yanzhi Wang , Sergey Tulyakov , Jian Ren

SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spaces

Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have…

Computer Vision and Pattern Recognition · Computer Science 2024-09-05 Yuta Oshima , Shohei Taniguchi , Masahiro Suzuki , Yutaka Matsuo

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet