Related papers: Fast and Memory-Efficient Video Diffusion Using St…

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Akio Kodaira , Chenfeng Xu , Toshiki Hazama , Takanori Yoshimoto , Kohei Ohno , Shogo Mitsuhori , Soichi Sugano , Hanying Cho , Zhijian Liu , Masayoshi Tomizuka , Kurt Keutzer

Accelerating Video Diffusion Models via Distribution Matching

Generative models, particularly diffusion models, have made significant success in data synthesis across various modalities, including images, videos, and 3D assets. However, current diffusion models are computationally intensive, often…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Yuanzhi Zhu , Hanshu Yan , Huan Yang , Kai Zhang , Junnan Li

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Zhengyao Lv , Chenyang Si , Junhao Song , Zhenyu Yang , Yu Qiao , Ziwei Liu , Kwan-Yee K. Wong

Efficient Video Diffusion Models: Advancements and Challenges

Video diffusion models have rapidly become the dominant paradigm for high-fidelity generative video synthesis, but their practical deployment remains constrained by severe inference costs. Compared with image generation, video synthesis…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Shitong Shao , Lichen Bai , Pengfei Wan , James Kwok , Zeke Xie

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1)…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Jintao Zhang , Kaiwen Zheng , Kai Jiang , Haoxu Wang , Ion Stoica , Joseph E. Gonzalez , Jianfei Chen , Jun Zhu

Video-Infinity: Distributed Long Video Generation

Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhenxiong Tan , Xingyi Yang , Songhua Liu , Xinchao Wang

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on…

Machine Learning · Computer Science 2025-04-18 Kafeng Wang , Jianfei Chen , He Li , Zhenpeng Mi , Jun Zhu

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation

Diffusion models can synthesize realistic co-speech video from audio for various applications, such as video creation and virtual agents. However, existing diffusion-based methods are slow due to numerous denoising steps and costly…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Beijia Lu , Ziyi Chen , Jing Xiao , Jun-Yan Zhu

DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion

Recent advancements in diffusion models have revolutionized video generation, enabling the creation of high-quality, temporally consistent videos. However, generating high frame-rate (FPS) videos remains a significant challenge due to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Geunmin Hwang , Hyun-kyu Ko , Younghyun Kim , Seungryong Lee , Eunbyung Park

DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Chang Zou , Changlin Li , Yang Li , Patrol Li , Jianbing Wu , Xiao He , Songtao Liu , Zhao Zhong , Kailin Huang , Linfeng Zhang

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered by slow inference speeds and high memory con-…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Sijie Wang , Qiang Wang , Shaohuai Shi

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Diffusion-based video super-resolution (VSR) methods deliver strong perceptual quality but are often unsuitable for latency-sensitive scenarios due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Hau-Shiang Shiu , Chin-Yang Lin , Zhixiang Wang , Chi-Wei Hsiao , Po-Fan Yu , Yu-Chih Chen , Yu-Lun Liu

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks. Despite these advances, diffusion-based video…

Graphics · Computer Science 2025-05-27 Shenggan Cheng , Yuanxin Wei , Lansong Diao , Yong Liu , Bujiao Chen , Lianghua Huang , Yu Liu , Wenyuan Yu , Jiangsu Du , Wei Lin , Yang You

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Clément Chadebec , Onur Tasar , Eyal Benaroche , Benjamin Aubin

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Tianwei Yin , Qiang Zhang , Richard Zhang , William T. Freeman , Fredo Durand , Eli Shechtman , Xun Huang

Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion

The text-guided video inpainting technique has significantly improved the performance of content generation applications. A recent family for these improvements uses diffusion models, which have become essential for achieving high-quality…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Bohai Gu , Hao Luo , Song Guo , Peiran Dong , Qihua Zhou

Object-Centric Diffusion for Efficient Video Editing

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Kumara Kahatapitiya , Adil Karjauv , Davide Abati , Fatih Porikli , Yuki M. Asano , Amirhossein Habibian

SwiftFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs

Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to increased latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Jiacheng Yang , Jun Wu , Yaoyao Ding , Zhiying Xu , Yida Wang , Gennady Pekhimenko