Related papers: A-SelecT: Automatic Timestep Selection for Diffusi…

Dynamic Diffusion Transformer

Diffusion Transformer (DiT), an emerging diffusion model for image generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs stem from the static inference…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Yibing Song , Gao Huang , Fan Wang , Yang You

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

Predict to Skip: Linear Multistep Feature Forecasting for Efficient Diffusion Transformers

Diffusion Transformers (DiT) have emerged as a widely adopted backbone for high-fidelity image and video generation, yet their iterative denoising process incurs high computational costs. Existing training-free acceleration methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Hanshuai Cui , Zhiqing Tang , Qianli Ma , Zhi Yao , Weijia Jia

Elastic Diffusion Transformer

Diffusion Transformers (DiT) have demonstrated remarkable generative capabilities but remain highly computationally expensive. Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Jiangshan Wang , Zeqiang Lai , Jiarui Chen , Jiayi Guo , Hang Guo , Xiu Li , Xiangyu Yue , Chunchao Guo

DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers

Recent breakthroughs in Diffusion Transformers (DiTs) have revolutionized the field of visual synthesis due to their superior scalability. To facilitate DiTs' capability of capturing meaningful internal representations, recent works such as…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Mengping Yang , Zhiyu Tan , Binglei Li , Xiaomeng Yang , Hesen Chen , Hao Li

U-DiT Policy: U-shaped Diffusion Transformers for Robotic Manipulation

Diffusion-based methods have been acknowledged as a powerful paradigm for end-to-end visuomotor control in robotics. Most existing approaches adopt a Diffusion Policy in U-Net architecture (DP-U), which, while effective, suffers from…

Robotics · Computer Science 2025-09-30 Linzhi Wu , Aoran Mei , Xiyue Wang , Guo-Niu Zhu , Zhongxue Gan

Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback

The advancements in generative modeling, particularly the advent of diffusion models, have sparked a fundamental question: how can these models be effectively used for discriminative tasks? In this work, we find that generative models can…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Mihir Prabhudesai , Tsung-Wei Ke , Alexander C. Li , Deepak Pathak , Katerina Fragkiadaki

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Diffusion Transformer (DiT) has emerged as the new trend of generative diffusion models on image generation. In view of extremely slow convergence in typical DiT, recent breakthroughs have been driven by mask strategy that significantly…

Computer Vision and Pattern Recognition · Computer Science 2024-03-26 Rui Zhu , Yingwei Pan , Yehao Li , Ting Yao , Zhenglong Sun , Tao Mei , Chang Wen Chen

DRDT3: Diffusion-Refined Decision Test-Time Training Model

Decision Transformer (DT), a trajectory modelling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal…

Machine Learning · Computer Science 2025-09-18 Xingshuai Huang , Di Wu , Benoit Boulet

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

In autonomous driving, deep models have shown remarkable performance across various visual perception tasks with the demand of high-quality and huge-diversity training datasets. Such datasets are expected to cover various driving scenarios…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Jiahang Tu , Wei Ji , Hanbin Zhao , Chao Zhang , Roger Zimmermann , Hui Qian

$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Pengtao Chen , Mingzhu Shen , Peng Ye , Jianjian Cao , Chongjun Tu , Christos-Savvas Bouganis , Yiren Zhao , Tao Chen

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Diffusion Transformer (DiT), a promising diffusion model for visual generation, demonstrates impressive performance but incurs significant computational overhead. Intriguingly, analysis of pre-trained DiT models reveals that global…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Yuang Ai , Qihang Fan , Xuefeng Hu , Zhenheng Yang , Ran He , Huaibo Huang

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As popular generative pre-training, diffusion models capture both low-level visual knowledge and high-level semantic relations. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Chaofan Ma , Yuhuan Yang , Chen Ju , Fei Zhang , Jinxiang Liu , Yu Wang , Ya Zhang , Yanfeng Wang

TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation

Self-supervised learning has garnered increasing attention in time series analysis for benefiting various downstream tasks and reducing reliance on labeled data. Despite its effectiveness, existing methods often struggle to comprehensively…

Machine Learning · Computer Science 2025-06-12 Daoyu Wang , Mingyue Cheng , Zhiding Liu , Qi Liu

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Recent Diffusion Transformers (DiTs) have shown impressive capabilities in generating high-quality single-modality content, including images, videos, and audio. However, it is still under-explored whether the transformer-based diffuser can…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Kai Wang , Shijian Deng , Jing Shi , Dimitrios Hatzinakos , Yapeng Tian

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance…

Computer Vision and Pattern Recognition · Computer Science 2024-10-31 Yuchuan Tian , Zhijun Tu , Hanting Chen , Jie Hu , Chao Xu , Yunhe Wang

SHIFT: Steering Hidden Intermediates in Flow Transformers

Diffusion models have become leading approaches for high-fidelity image generation. Recent DiT-based diffusion models, in particular, achieve strong prompt adherence while producing high-quality samples. We propose SHIFT, a simple but…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Nina Konovalova , Andrey Kuznetsov , Aibek Alanov

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

Recent advances in video generation models has significantly accelerated video generation and related downstream tasks. Among these, video stylization holds important research value in areas such as immersive applications and artistic…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Hengye Lyu , Zisu Li , Yue Hong , Yueting Weng , Jiaxin Shi , Hanwang Zhang , Chen Liang

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Haoran You , Connelly Barnes , Yuqian Zhou , Yan Kang , Zhenbang Du , Wei Zhou , Lingzhi Zhang , Yotam Nitzan , Xiaoyang Liu , Zhe Lin , Eli Shechtman , Sohrab Amirghodsi , Yingyan Celine Lin

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has…

Machine Learning · Computer Science 2025-05-27 Mingzhuo Li , Guang Li , Jiafeng Mao , Takahiro Ogawa , Miki Haseyama