English
Related papers

Related papers: SeaCache: Spectral-Evolution-Aware Cache for Accel…

200 papers

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Yasaman Haghighi , Alexandre Alahi

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Feng Liu , Shiwei Zhang , Xiaofeng Wang , Yujie Wei , Haonan Qiu , Yuzhong Zhao , Yingya Zhang , Qixiang Ye , Fang Wan

Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation, yet their iterative denoising process incurs substantial computational cost during inference. Existing caching methods…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Guandong Li

Recent years have witnessed the rapid development of acceleration techniques for diffusion models, especially caching-based acceleration methods. These studies seek to answer two fundamental questions: "When to cache" and "How to use…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Jiazi Bu , Pengyang Ling , Yujie Zhou , Yibin Wang , Yuhang Zang , Dahua Lin , Jiaqi Wang

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Xinyin Ma , Gongfan Fang , Xinchao Wang

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Anirud Aggarwal , Abhinav Shrivastava , Matthew Gwilliam

Existing cache-based acceleration methods for video diffusion models primarily skip early or mid denoising steps, which often leads to structural discrepancies relative to full-timestep generation and can hinder instruction following and…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Zhentao Fan , Zongzuo Wang , Weiwei Zhang

Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis. However, their inference process remains computationally expensive due to the repeated evaluation of…

Machine Learning · Computer Science 2025-05-23 Joseph Liu , Joshua Geddes , Ziyu Guo , Haomiao Jiang , Mahesh Kumar Nandwana

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Fanpu Cao , Yaofo Chen , Zeng You , Wei Luo

Diffusion models suffer from substantial computational overhead due to their inherently iterative inference process. While feature caching offers a promising acceleration strategy by reusing intermediate outputs across timesteps, naive…

Computer Vision and Pattern Recognition · Computer Science 2026-02-11 Xurui Peng , Chenqian Yan , Hong Liu , Rui Ma , Fangmin Chen , Xing Wang , Zhihua Wu , Songwei Liu , Mingbao Lin

The emergence of diffusion models has significantly advanced generative AI, improving the quality, realism, and creativity of image and video generation. Among them, Stable Diffusion (StableDiff) stands out as a key model for text-to-image…

Hardware Architecture · Computer Science 2025-07-03 Zhican Wang , Guanghui He , Hongxiang Fan

Diffusion-based video editing has emerged as an important paradigm for high-quality and flexible content generation. However, despite their generality and strong modeling capacity, Diffusion Transformers (DiT) remain computationally…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Tianyi Liu , Ye Lu , Linfeng Zhang , Chen Cai , Jianjun Gao , Yi Wang , Kim-Hui Yap , Lap-Pui Chau

Diffusion models have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. These models face two fundamental challenges: strict temporal dependencies…

Machine Learning · Computer Science 2025-09-16 Jiacheng Liu , Chang Zou , Yuanhuiyi Lyu , Fei Ren , Shaobo Wang , Kaixin Li , Linfeng Zhang

Video generation models have demonstrated remarkable performance, yet their broader adoption remains constrained by slow inference speeds and substantial computational costs, primarily due to the iterative nature of the denoising process.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Xin Zhou , Dingkang Liang , Kaijin Chen , Tianrui Feng , Xiwu Chen , Hongkai Lin , Yikang Ding , Feiyang Tan , Hengshuang Zhao , Xiang Bai

Diffusion and rectified flow (RF) models generate high-fidelity images and videos, but their iterative velocity-field evaluations are computationally expensive. Existing caching methods accelerate sampling by skipping timesteps, yet their…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xiao Liu , Kai Liu , Naiyang Guan , Hongliang Lu , Zhixin Wang , Zhikai Chen , Renjing Pei , Yulun Zhang

The application of diffusion transformers is suffering from their significant inference costs. Recently, feature caching has been proposed to solve this problem by reusing features from previous timesteps, thereby skipping computation in…

Recent advancements in Diffusion Transformers (DiTs) have established them as the state-of-the-art method for video generation. However, their inherently sequential denoising process results in inevitable latency, limiting real-world…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Hanshuai Cui , Zhiqing Tang , Zhifei Xu , Zhi Yao , Wenyi Zeng , Weijia Jia

Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive. To accelerate inference, feature caching methods have emerged by…

Computer Vision and Pattern Recognition · Computer Science 2026-01-13 Guantao Chen , Shikang Zheng , Yuqi Lin , Linfeng Zhang

Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Kumara Kahatapitiya , Haozhe Liu , Sen He , Ding Liu , Menglin Jia , Chenyang Zhang , Michael S. Ryoo , Tian Xie

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Umair Nawaz , Ahmed Heakl , Ufaq Khan , Abdelrahman Shaker , Salman Khan , Fahad Shahbaz Khan
‹ Prev 1 2 3 10 Next ›