Related papers: Dynamic Diffusion Transformer

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

Elastic Diffusion Transformer

Diffusion Transformers (DiT) have demonstrated remarkable generative capabilities but remain highly computationally expensive. Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Jiangshan Wang , Zeqiang Lai , Jiarui Chen , Jiayi Guo , Hang Guo , Xiu Li , Xiangyu Yue , Chunchao Guo

Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers

Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation, yet their iterative denoising process incurs high computational costs. Existing training-free acceleration methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Hanshuai Cui , Zhiqing Tang , Qianli Ma , Zhi Yao , Weijia Jia

DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in image and video generation, but their success comes at the cost of heavy computation. This inefficiency is largely due to the fixed tokenization process, which uses…

Computer Vision and Pattern Recognition · Computer Science 2026-02-20 Dahye Kim , Deepti Ghadiyaram , Raghudeep Gadde

$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Pengtao Chen , Mingzhu Shen , Peng Ye , Jianjian Cao , Chongjun Tu , Christos-Savvas Bouganis , Yiren Zhao , Tao Chen

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Haoran You , Connelly Barnes , Yuqian Zhou , Yan Kang , Zhenbang Du , Wei Zhou , Lingzhi Zhang , Yotam Nitzan , Xiaoyang Liu , Zhe Lin , Eli Shechtman , Sohrab Amirghodsi , Yingyan Celine Lin

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Akash Haridas , Utkarsh Saxena , Parsa Ashrafi Fashi , Mehdi Rezagholizadeh , Vikram Appia , Emad Barsoum

D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation

Diffusion models are widely recognized for their ability to generate high-fidelity images. Despite the excellent performance and scalability of the Diffusion Transformer (DiT) architecture, it applies fixed compression across different…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Weinan Jia , Mengqi Huang , Nan Chen , Lei Zhang , Zhendong Mao

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Diffusion Transformers (DiT) are renowned for their impressive generative performance; however, they are significantly constrained by considerable computational costs due to the quadratic complexity in self-attention and the extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Shuning Chang , Pichao Wang , Jiasheng Tang , Fan Wang , Yi Yang

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we…

Machine Learning · Computer Science 2025-02-28 Sotiris Anagnostidis , Gregor Bachmann , Yeongmin Kim , Jonas Kohler , Markos Georgopoulos , Artsiom Sanakoyeu , Yuming Du , Albert Pumarola , Ali Thabet , Edgar Schönfeld

DiffiT: Diffusion Vision Transformers for Image Generation

Diffusion models with their powerful expressivity and high sample quality have achieved State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision Transformer (ViT) has also demonstrated strong modeling capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Ali Hatamizadeh , Jiaming Song , Guilin Liu , Jan Kautz , Arash Vahdat

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

The Diffusion Transformer (DiT) architecture is the state-of-the-art paradigm for high-fidelity image generation, underpinning models like Stable Diffusion-3 and FLUX.1. However, deploying these models on resource-constrained mobile devices…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Kunpeng Du , Haizhen Xie , Sen Lu , Lei Yu , Binglei Bao , Huaao Tang , Chuntao Liu , Hao Wu , Yang Zhao , Zhicai Huang , Heyuan Gao , Zhijun Tu , Jie Hu , Xinghao Chen

A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

Diffusion models have significantly reshaped the field of generative artificial intelligence and are now increasingly explored for their capacity in discriminative representation learning. Diffusion Transformer (DiT) has recently gained…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Changyu Liu , James Chenhao Liang , Wenhao Yang , Yiming Cui , Jinghao Yang , Tianyang Wang , Qifan Wang , Dongfang Liu , Cheng Han

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Diffusion Transformer (DiT), a promising diffusion model for visual generation, demonstrates impressive performance but incurs significant computational overhead. Intriguingly, analysis of pre-trained DiT models reveals that global…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Yuang Ai , Qihang Fan , Xuefeng Hu , Zhenheng Yang , Ran He , Huaibo Huang

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of…

Computer Vision and Pattern Recognition · Computer Science 2025-06-02 Mang Ning , Mingxiao Li , Jianlin Su , Haozhe Jia , Lanmiao Liu , Martin Beneš , Wenshuo Chen , Albert Ali Salah , Itir Onal Ertugrul

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Dongting Hu , Aarush Gupta , Magzhan Gabidolla , Arpit Sahni , Huseyin Coskun , Yanyu Li , Yerlan Idelbayev , Ahsan Mahmood , Aleksei Lebedev , Dishani Lahiri , Anujraaj Goyal , Ju Hu , Mingming Gong , Sergey Tulyakov , Anil Kag

SDiT: Semantic Region-Adaptive for Diffusion Transformers

Diffusion Transformers (DiTs) achieve state-of-the-art performance in text-to-image synthesis but remain computationally expensive due to the iterative nature of denoising and the quadratic cost of global attention. In this work, we observe…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Bowen Lin , Fanjiang Ye , Yihua Liu , Zhenghui Guo , Boyuan Zhang , Weijian Zheng , Yufan Xu , Tiancheng Xing , Yuke Wang , Chengming Zhang

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Diffusion Transformers (DiTs) have emerged as a leading architecture for text-to-image synthesis, producing high-quality and photorealistic images. However, the quadratic scaling properties of the attention in DiTs hinder image generation…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Philipp Becker , Abhinav Mehrotra , Ruchika Chavhan , Malcolm Chadwick , Luca Morreale , Mehdi Noroozi , Alberto Gil Ramos , Sourav Bhattacharya

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability. However, their practical application suffers from inherent dynamic feature instability, leading…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Guanjie Chen , Xinyu Zhao , Yucheng Zhou , Xiaoye Qu , Tianlong Chen , Yu Cheng

Dynamic Differential Linear Attention: Enhancing Linear Diffusion Transformer for High-Quality Image Generation

Diffusion transformers (DiTs) have emerged as a powerful architecture for high-fidelity image generation, yet the quadratic cost of self-attention poses a major scalability bottleneck. To address this, linear attention mechanisms have been…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Boyuan Cao , Xingbo Yao , Chenhui Wang , Jiaxin Ye , Yujie Wei , Hongming Shan