Related papers: DiffSparse: Accelerating Diffusion Transformers wi…

Accelerating Diffusion Transformers with Token-wise Feature Caching

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer…

Machine Learning · Computer Science 2024-11-19 Xinyin Ma , Gongfan Fang , Michael Bi Mi , Xinchao Wang

Token Caching for Diffusion Transformer Acceleration

Diffusion transformers have gained substantial interest in diffusion generative modeling due to their outstanding performance. However, their computational demands, particularly the quadratic complexity of attention mechanisms and…

Machine Learning · Computer Science 2026-01-28 Jinming Lou , Wenyang Luo , Yufan Liu , Bing Li , Xinmiao Ding , Weiming Hu , Yuming Li , Chenguang Ma

DiSC: Resolution-Scalable Acceleration of Diffusion Models by Exploiting Sparsity and Cached Token Reuse with Hash-based Distribution

Transformer-based diffusion models offer superior scalability and performance but suffer from high computational overhead due to the iterative nature and quadratic complexity of self-attention at high resolutions. In this paper, we propose…

Hardware Architecture · Computer Science 2026-05-26 Jieon Yoon , Hangyeol Lee , Jaehoon Heo , Joo-Young Kim

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose \textbf{FastCache}, a…

Machine Learning · Computer Science 2026-03-30 Dong Liu , Yanxuan Yu , Jiayi Zhang , Yifan Li , Ben Lengerich , Ying Nian Wu

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching

This paper presents a method to accelerate the inference process of diffusion transformer (DiT)-based text-to-speech (TTS) models by applying a selective caching mechanism to transformer layers. Specifically, I integrate SmoothCache into…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-11 Siratish Sakpiboonchit

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on…

Machine Learning · Computer Science 2025-04-18 Kafeng Wang , Jianfei Chen , He Li , Zhenpeng Mi , Jun Zhu

Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation

Image tokenization plays a central role in modern generative modeling by mapping visual inputs into compact representations that serve as an intermediate signal between pixels and generative models. Diffusion-based decoders have recently…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Chuhan Wang , Hao Chen

$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Pengtao Chen , Mingzhu Shen , Peng Ye , Jianjian Cao , Chongjun Tu , Christos-Savvas Bouganis , Yiren Zhao , Tao Chen

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Fanpu Cao , Yaofo Chen , Zeng You , Wei Luo

Rethinking Token-wise Feature Caching: Accelerating Diffusion Transformers with Dual Feature Caching

Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, feature caching methods are designed to cache the…

Machine Learning · Computer Science 2025-11-19 Chang Zou , Evelyn Zhang , Runlin Guo , Haohang Xu , Conghui He , Xuming Hu , Linfeng Zhang

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Diffusion Transformers (DiT) are renowned for their impressive generative performance; however, they are significantly constrained by considerable computational costs due to the quadratic complexity in self-attention and the extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Shuning Chang , Pichao Wang , Jiasheng Tang , Fan Wang , Yi Yang

DeepCache: Accelerating Diffusion Models for Free

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Xinyin Ma , Gongfan Fang , Xinchao Wang

Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching

Diffusion transformers have gained significant attention in recent years for their ability to generate high-quality images and videos, yet still suffer from a huge computational cost due to their iterative denoising process. Recently,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Zhixin Zheng , Xinyu Wang , Chang Zou , Shaobo Wang , Linfeng Zhang

DiCache: Let Diffusion Model Determine Its Own Cache

Recent years have witnessed the rapid development of acceleration techniques for diffusion models, especially caching-based acceleration methods. These studies seek to answer two fundamental questions: "When to cache" and "How to use…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Jiazi Bu , Pengyang Ling , Yujie Zhou , Yibin Wang , Yuhang Zang , Dahua Lin , Jiaqi Wang

DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Chang Zou , Changlin Li , Yang Li , Patrol Li , Jianbing Wu , Xiao He , Songtao Liu , Zhao Zhong , Kailin Huang , Linfeng Zhang

DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference

Diffusion models produce high quality images but inference is costly due to many denoising steps and heavy matrix operations. We present DiffPro, a post-training, hardware-faithful framework that works with the exact integer kernels used in…

Machine Learning · Computer Science 2025-11-17 Farhana Amin , Sabiha Afroz , Kanchon Gharami , Mona Moghadampanah , Dimitrios S. Nikolopoulos

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Anirud Aggarwal , Abhinav Shrivastava , Matthew Gwilliam

Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences

Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach to improving the efficiency of Transformers by restricting self-attention…

Machine Learning · Computer Science 2023-02-01 Aosong Feng , Irene Li , Yuang Jiang , Rex Ying

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Zihao Wu