English
Related papers

Related papers: Token Caching for Diffusion Transformer Accelerati…

200 papers

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose \textbf{FastCache}, a…

Machine Learning · Computer Science 2026-03-30 Dong Liu , Yanxuan Yu , Jiayi Zhang , Yifan Li , Ben Lengerich , Ying Nian Wu

Diffusion transformers have gained significant attention in recent years for their ability to generate high-quality images and videos, yet still suffer from a huge computational cost due to their iterative denoising process. Recently,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Zhixin Zheng , Xinyu Wang , Chang Zou , Shaobo Wang , Linfeng Zhang

Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, feature caching methods are designed to cache the…

Machine Learning · Computer Science 2025-11-19 Chang Zou , Evelyn Zhang , Runlin Guo , Haohang Xu , Conghui He , Xuming Hu , Linfeng Zhang

Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer…

Machine Learning · Computer Science 2024-11-19 Xinyin Ma , Gongfan Fang , Michael Bi Mi , Xinchao Wang

Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis; however, their iterative denoising process demands substantial computational resources. In this paper, we present a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinle Cheng , Zhuoming Chen , Zhihao Jia

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Fanpu Cao , Yaofo Chen , Zeng You , Wei Luo

Diffusion-based world models have shown strong potential for unified world simulation, but the iterative denoising remains too costly for interactive use and long-horizon rollouts. While feature caching can accelerate inference without…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Weilun Feng , Guoxin Fan , Haotong Qin , Chuanguang Yang , Mingqiang Wu , Yuqi Li , Xiangqi Li , Zhulin An , Libo Huang , Dingrui Wang , Longlong Liao , Michele Magno , Yongjun Xu

Diffusion-based video editing has emerged as an important paradigm for high-quality and flexible content generation. However, despite their generality and strong modeling capacity, Diffusion Transformers (DiT) remain computationally…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Tianyi Liu , Ye Lu , Linfeng Zhang , Chen Cai , Jianjun Gao , Yi Wang , Kim-Hui Yap , Lap-Pui Chau

Feature caching has emerged as an effective strategy to accelerate diffusion transformer (DiT) sampling through temporal feature reuse. It is a challenging problem since (1) Progressive error accumulation from cached blocks significantly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-21 Junxiang Qiu , Lin Liu , Shuo Wang , Jinda Lu , Kezhou Chen , Yanbin Hao

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Feng Liu , Shiwei Zhang , Xiaofeng Wang , Yujie Wei , Haonan Qiu , Yuzhong Zhao , Yingya Zhang , Qixiang Ye , Fang Wan

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially for large transformer-based models, such as LLMs. While existing…

Computation and Language · Computer Science 2025-02-03 Antoine Simoulin , Namyong Park , Xiaoyi Liu , Grey Yang

Diffusion Models have become a cornerstone of modern generative AI for their exceptional generation quality and controllability. However, their inherent \textit{multi-step iterations} and \textit{complex backbone networks} lead to…

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Yasaman Haghighi , Alexandre Alahi

Diffusion models have emerged as a promising approach for generating high-quality, high-dimensional images. Nevertheless, these models are hindered by their high computational cost and slow inference, partly due to the quadratic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Omid Saghatchian , Atiyeh Gh. Moghadam , Ahmad Nickabadi

Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Kumara Kahatapitiya , Haozhe Liu , Sen He , Ding Liu , Menglin Jia , Chenyang Zhang , Michael S. Ryoo , Tian Xie

Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance. However, the enhanced capabilities of DiTs come with significant drawbacks,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Junyi Wu , Zhiteng Li , Zheng Hui , Yulun Zhang , Linghe Kong , Xiaokang Yang

Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis. However, their inference process remains computationally expensive due to the repeated evaluation of…

Machine Learning · Computer Science 2025-05-23 Joseph Liu , Joshua Geddes , Ziyu Guo , Haomiao Jiang , Mahesh Kumar Nandwana

Diffusion models have emerged as a powerful paradigm for generative tasks such as image synthesis and video generation, with Transformer architectures further enhancing performance. However, the high computational cost of diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Huanpeng Chu , Wei Wu , Guanyu Fen , Yutao Zhang
‹ Prev 1 2 3 10 Next ›