English
Related papers

Related papers: DDiT: Dynamic Patch Scheduling for Efficient Diffu…

200 papers

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Akash Haridas , Utkarsh Saxena , Parsa Ashrafi Fashi , Mehdi Rezagholizadeh , Vikram Appia , Emad Barsoum

Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Yibing Song , Gao Huang , Fan Wang , Yang You

Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation, yet their iterative denoising process incurs substantial computational cost during inference. Existing caching methods…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Guandong Li

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

Diffusion transformers (DiTs) adopt Patchify, mapping patch representations to token representations through linear projections, to adjust the number of tokens input to DiT blocks and thus the computation cost. Instead of a single patch…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Hui Li , Baoyou Chen , Liwei Zhang , Jiaye Li , Jingdong Wang , Siyu Zhu

Diffusion transformers (DiT) have demonstrated exceptional performance in video generation. However, their large number of parameters and high computational complexity limit their deployment on edge devices. Quantization can reduce storage…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Weilun Feng , Chuanguang Yang , Haotong Qin , Xiangqi Li , Yu Wang , Zhulin An , Libo Huang , Boyu Diao , Zixiang Zhao , Yongjun Xu , Michele Magno

Diffusion-based video editing has emerged as an important paradigm for high-quality and flexible content generation. However, despite their generality and strong modeling capacity, Diffusion Transformers (DiT) remain computationally…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Tianyi Liu , Ye Lu , Linfeng Zhang , Chen Cai , Jianjun Gao , Yi Wang , Kim-Hui Yap , Lap-Pui Chau

Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, feature caching methods are designed to cache the…

Machine Learning · Computer Science 2025-11-19 Chang Zou , Evelyn Zhang , Runlin Guo , Haohang Xu , Conghui He , Xuming Hu , Linfeng Zhang

Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Kumara Kahatapitiya , Haozhe Liu , Sen He , Ding Liu , Menglin Jia , Chenyang Zhang , Michael S. Ryoo , Tian Xie

Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation, yet their iterative denoising process incurs high computational costs. Existing training-free acceleration methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Hanshuai Cui , Zhiqing Tang , Qianli Ma , Zhi Yao , Weijia Jia

Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Dong Liu , Yanxuan Yu , Ben Lengerich , Ying Nian Wu

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Diffusion Transformers (DiT) are renowned for their impressive generative performance; however, they are significantly constrained by considerable computational costs due to the quadratic complexity in self-attention and the extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Shuning Chang , Pichao Wang , Jiasheng Tang , Fan Wang , Yi Yang

Diffusion transformers have demonstrated remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Tianchen Zhao , Tongcheng Fang , Haofeng Huang , Enshu Liu , Rui Wan , Widyadewi Soedarmadji , Shiyao Li , Zinan Lin , Guohao Dai , Shengen Yan , Huazhong Yang , Xuefei Ning , Yu Wang

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Fanpu Cao , Yaofo Chen , Zeng You , Wei Luo

Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we…

Diffusion Transformers (DiTs) have significantly enhanced text-to-image (T2I) generation quality, enabling high-quality personalized content creation. However, fine-tuning these models requires substantial computational complexity and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Sunghyun Park , Jeongho Kim , Hyoungwoo Park , Debasmit Das , Sungrack Yun , Munawar Hayat , Jaegul Choo , Fatih Porikli , Seokeon Choi

Diffusion Transformers (DiTs) deliver remarkable image and video generation quality but incur high computational cost, limiting scalability and on-device deployment. We introduce CoReDiT, a structured token pruning framework for DiTs across…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Zhuojin Li , Hsin-Pai Cheng , Hong Cai , Shizhong Han , Fatih Porikli

While Diffusion Transformers (DiTs) have achieved notable progress in video generation, this long-sequence generation task remains constrained by the quadratic complexity inherent to self-attention mechanisms, creating significant barriers…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Yuxi Liu , Yipeng Hu , Zekun Zhang , Kunze Jiang , Kun Yuan

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Junhyuk So , Jungwon Lee , Daehyun Ahn , Hyungjun Kim , Eunhyeok Park
‹ Prev 1 2 3 10 Next ›