Related papers: Elastic Diffusion Transformer

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

The Diffusion Transformer (DiT) architecture is the state-of-the-art paradigm for high-fidelity image generation, underpinning models like Stable Diffusion-3 and FLUX.1. However, deploying these models on resource-constrained mobile devices…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Kunpeng Du , Haizhen Xie , Sen Lu , Lei Yu , Binglei Bao , Huaao Tang , Chuntao Liu , Hao Wu , Yang Zhao , Zhicai Huang , Heyuan Gao , Zhijun Tu , Jie Hu , Xinghao Chen

One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers

Diffusion transformers (DiTs) achieve high generative quality but lock FLOPs to image resolution, limiting principled latency-quality trade-offs, and allocate computation uniformly across input spatial tokens, wasting resource allocation to…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Moayed Haji-Ali , Willi Menapace , Ivan Skorokhodov , Dogyun Park , Anil Kag , Michael Vasilkovsky , Sergey Tulyakov , Vicente Ordonez , Aliaksandr Siarohin

Dynamic Diffusion Transformer

Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Yibing Song , Gao Huang , Fan Wang , Yang You

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Diffusion Transformers (DiT) are renowned for their impressive generative performance; however, they are significantly constrained by considerable computational costs due to the quadratic complexity in self-attention and the extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Shuning Chang , Pichao Wang , Jiasheng Tang , Fan Wang , Yi Yang

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Diffusion Transformers (DiTs) have emerged as a leading architecture for text-to-image synthesis, producing high-quality and photorealistic images. However, the quadratic scaling properties of the attention in DiTs hinder image generation…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Philipp Becker , Abhinav Mehrotra , Ruchika Chavhan , Malcolm Chadwick , Luca Morreale , Mehdi Noroozi , Alberto Gil Ramos , Sourav Bhattacharya

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Akash Haridas , Utkarsh Saxena , Parsa Ashrafi Fashi , Mehdi Rezagholizadeh , Vikram Appia , Emad Barsoum

$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Pengtao Chen , Mingzhu Shen , Peng Ye , Jianjian Cao , Chongjun Tu , Christos-Savvas Bouganis , Yiren Zhao , Tao Chen

Mixture of Distributions Matters: Dynamic Sparse Attention for Efficient Video Diffusion Transformers

While Diffusion Transformers (DiTs) have achieved notable progress in video generation, this long-sequence generation task remains constrained by the quadratic complexity inherent to self-attention mechanisms, creating significant barriers…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Yuxi Liu , Yipeng Hu , Zekun Zhang , Kunze Jiang , Kun Yuan

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability. However, their practical application suffers from inherent dynamic feature instability, leading…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Guanjie Chen , Xinyu Zhao , Yucheng Zhou , Xiaoye Qu , Tianlong Chen , Yu Cheng

EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Diffusion Transformers (DiT) have established a new state-of-the-art in high-fidelity image synthesis; however, their massive computational complexity and memory requirements hinder local deployment on resource-constrained edge devices. In…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Sravanth Kodavanti , Manjunath Arveti , Sowmya Vajrala , Srinivas Miriyala , Vikram N R

Accelerating Diffusion Transformer via Error-Optimized Cache

Diffusion Transformer (DiT) is a crucial method for content generation. However, it needs a lot of time to sample. Many studies have attempted to use caching to reduce the time consumption of sampling. Existing caching methods accelerate…

Computer Vision and Pattern Recognition · Computer Science 2025-07-21 Junxiang Qiu , Shuo Wang , Jinda Lu , Lin Liu , Houcheng Jiang , Xingyu Zhu , Yanbin Hao

Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers

Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation, yet their iterative denoising process incurs high computational costs. Existing training-free acceleration methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Hanshuai Cui , Zhiqing Tang , Qianli Ma , Zhi Yao , Weijia Jia

Designing Parameter and Compute Efficient Diffusion Transformers using Distillation

Diffusion Transformers (DiTs) with billions of model parameters form the backbone of popular image and video generation models like DALL.E, Stable-Diffusion and SORA. Though these models are necessary in many low-latency applications like…

Computer Vision and Pattern Recognition · Computer Science 2025-02-21 Vignesh Sundaresha

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Dongting Hu , Aarush Gupta , Magzhan Gabidolla , Arpit Sahni , Huseyin Coskun , Yanyu Li , Yerlan Idelbayev , Ahsan Mahmood , Aleksei Lebedev , Dishani Lahiri , Anujraaj Goyal , Ju Hu , Mingming Gong , Sergey Tulyakov , Anil Kag

Memory-Efficient Fine-Tuning Diffusion Transformers via Dynamic Patch Sampling and Block Skipping

Diffusion Transformers (DiTs) have significantly enhanced text-to-image (T2I) generation quality, enabling high-quality personalized content creation. However, fine-tuning these models requires substantial computational complexity and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Sunghyun Park , Jeongho Kim , Hyoungwoo Park , Debasmit Das , Sungrack Yun , Munawar Hayat , Jaegul Choo , Fatih Porikli , Seokeon Choi

E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources

Diffusion models have shown strong capabilities in generating high-quality images from text prompts. However, these models often require large-scale training data and significant computational resources to train, or suffer from heavy…

Computer Vision and Pattern Recognition · Computer Science 2025-11-03 Tong Shen , Jingai Yu , Dong Zhou , Dong Li , Emad Barsoum

DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving

The Text-to-Video (T2V) model aims to generate dynamic and expressive videos from textual prompts. The generation pipeline typically involves multiple modules, such as language encoder, Diffusion Transformer (DiT), and Variational…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-17 Heyang Huang , Cunchen Hu , Jiaqi Zhu , Ziyuan Gao , Liangliang Xu , Yizhou Shan , Yungang Bao , Sun Ninghui , Tianwei Zhang , Sa Wang

RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer

Diffusion Transformers (DiTs) excel at visual generation yet remain hampered by slow sampling. Existing training-free accelerators - step reduction, feature caching, and sparse attention - enhance inference speed but typically rely on a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Wangbo Zhao , Yizeng Han , Zhiwei Tang , Jiasheng Tang , Pengfei Zhou , Kai Wang , Bohan Zhuang , Zhangyang Wang , Fan Wang , Yang You

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we…

Machine Learning · Computer Science 2025-02-28 Sotiris Anagnostidis , Gregor Bachmann , Yeongmin Kim , Jonas Kohler , Markos Georgopoulos , Artsiom Sanakoyeu , Yuming Du , Albert Pumarola , Ali Thabet , Edgar Schönfeld