Related papers: Beyond Fixed Formulas: Data-Driven Linear Predicto…

Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers

Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation, yet their iterative denoising process incurs high computational costs. Existing training-free acceleration methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Hanshuai Cui , Zhiqing Tang , Qianli Ma , Zhi Yao , Weijia Jia

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. While feature…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Peiliang Cai , Jiacheng Liu , Haowen Xu , Xinyu Wang , Chang Zou , Linfeng Zhang

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer…

Machine Learning · Computer Science 2024-11-19 Xinyin Ma , Gongfan Fang , Michael Bi Mi , Xinchao Wang

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Fanpu Cao , Yaofo Chen , Zeng You , Wei Luo

Latent Diffusion Planning for Imitation Learning

Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert…

Robotics · Computer Science 2025-04-24 Amber Xie , Oleh Rybkin , Dorsa Sadigh , Chelsea Finn

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose \textbf{FastCache}, a…

Machine Learning · Computer Science 2026-03-30 Dong Liu , Yanxuan Yu , Jiayi Zhang , Yifan Li , Ben Lengerich , Ying Nian Wu

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Diffusion Transformers (DiTs) have demonstrated remarkable performance in visual generation tasks. However, their low inference speed limits their deployment in low-resource applications. Recent training-free approaches exploit the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Xiaoliu Guan , Lielin Jiang , Hanqi Chen , Xu Zhang , Jiaxing Yan , Guanzhong Wang , Yi Liu , Zetao Zhang , Yu Wu

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Learning to Prompt for Continual Learning

The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task…

Machine Learning · Computer Science 2022-03-23 Zifeng Wang , Zizhao Zhang , Chen-Yu Lee , Han Zhang , Ruoxi Sun , Xiaoqi Ren , Guolong Su , Vincent Perot , Jennifer Dy , Tomas Pfister

L2P: Unlocking Latent Potential for Pixel Generation

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Zhennan Chen , Junwei Zhu , Xu Chen , Jiangning Zhang , Jiawei Chen , Zhuoqi Zeng , Wei Zhang , Chengjie Wang , Jian Yang , Ying Tai

DiP: Taming Diffusion Models in Pixel Space

Diffusion models face a fundamental trade-off between generation quality and computational efficiency. Latent Diffusion Models (LDMs) offer an efficient solution but suffer from potential information loss and non-end-to-end training. In…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Zhennan Chen , Junwei Zhu , Xu Chen , Jiangning Zhang , Xiaobin Hu , Hanzhen Zhao , Chengjie Wang , Jian Yang , Ying Tai

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we…

Machine Learning · Computer Science 2025-02-28 Sotiris Anagnostidis , Gregor Bachmann , Yeongmin Kim , Jonas Kohler , Markos Georgopoulos , Artsiom Sanakoyeu , Yuming Du , Albert Pumarola , Ali Thabet , Edgar Schönfeld

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Diffusion Transformers (DiT) have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. To solve this problem, feature caching has been proposed to accelerate…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Jiacheng Liu , Chang Zou , Yuanhuiyi Lyu , Junjie Chen , Linfeng Zhang

Dynamic Diffusion Transformer

Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Yibing Song , Gao Huang , Fan Wang , Yang You

SoftCap: Soft-Budget Control for Diffusion Transformer Acceleration

Diffusion Transformers (DiTs) achieve strong visual quality, but their iterative denoising process requires many costly Transformer evaluations. Training-free acceleration methods reduce this cost by caching, forecasting, or verifying…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Yuhang Zhang , Junxiang Qiu , Huixia Ben , Zhenhua Tang , Shuo Wang , Yanbin Hao

Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation

Diffusion-based talking head models generate high-quality, photorealistic videos but suffer from slow inference, limiting practical applications. Existing acceleration methods for general diffusion models fail to exploit the temporal and…

Graphics · Computer Science 2026-01-21 Jianzhi Long , Wenhao Sun , Rongcheng Tu , Dacheng Tao

Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

Diffusion transformer (DiT) models have achieved remarkable success in image generation, thanks for their exceptional generative capabilities and scalability. Nonetheless, the iterative nature of diffusion models (DMs) results in high…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Zhiyuan Chen , Keyi Li , Yifan Jia , Le Ye , Yufei Ma

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Kumara Kahatapitiya , Haozhe Liu , Sen He , Ding Liu , Menglin Jia , Chenyang Zhang , Michael S. Ryoo , Tian Xie

Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation. To reduce their substantial computational costs, feature caching techniques have been proposed to accelerate inference by…

Computer Vision and Pattern Recognition · Computer Science 2025-08-25 Shikang Zheng , Liang Feng , Xinyu Wang , Qinming Zhou , Peiliang Cai , Chang Zou , Jiacheng Liu , Yuqi Lin , Junjie Chen , Yue Ma , Linfeng Zhang

$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Pengtao Chen , Mingzhu Shen , Peng Ye , Jianjian Cao , Chongjun Tu , Christos-Savvas Bouganis , Yiren Zhao , Tao Chen