Related papers: DiffMoE: Dynamic Token Selection for Scalable Diff…

Scaling Diffusion Transformers to 16 Billion Parameters

In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference. The DiT-MoE includes two simple designs: shared expert…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Zhengcong Fei , Mingyuan Fan , Changqian Yu , Debang Li , Junshi Huang

RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction

Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xiaoping Wu , Jie Hu , Xiaoming Wei

DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion Models

Generating high-quality labeled image datasets is crucial for training accurate and robust machine learning models in the field of computer vision. However, the process of manually labeling real images is often time-consuming and costly. To…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Michael Shenoda , Edward Kim

Diffusion for Natural Image Matting

We aim to leverage diffusion to address the challenging image matting task. However, the presence of high computational overhead and the inconsistency of noise sampling between the training and inference processes pose significant obstacles…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Yihan Hu , Yiheng Lin , Wei Wang , Yao Zhao , Yunchao Wei , Humphrey Shi

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand,…

Machine Learning · Computer Science 2022-11-30 Kushagra Pandey , Avideep Mukherjee , Piyush Rai , Abhishek Kumar

An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques

Image classification serves as the cornerstone of computer vision, traditionally achieved through discriminative models based on deep neural networks. Recent advancements have introduced classification methods derived from generative…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Chunxiao Li , Xiaoxiao Wang , Boming Miao , Chuanlong Xie , Zizhe Wang , Yao Zhu

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks

Mixture-of-Experts (MoE) architectures have emerged as a powerful paradigm for scaling neural networks while maintaining computational efficiency. However, standard MoE implementations rely on two rigid design assumptions: (1) fixed Top-K…

Machine Learning · Computer Science 2026-03-03 Gökdeniz Gülmez

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation

Diffusion Transformer (DiT) has demonstrated remarkable performance in text-to-image generation; however, its large parameter size results in substantial inference overhead. Existing parameter compression methods primarily focus on pruning,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-13 Youwei Zheng , Yuxi Ren , Xin Xia , Xuefeng Xiao , Xiaohua Xie

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Beyond high-fidelity image synthesis, diffusion models have recently exhibited promising results in dense visual perception tasks. However, most existing work treats diffusion models as a standalone component for perception tasks, employing…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Shuhong Zheng , Zhipeng Bao , Ruoyu Zhao , Martial Hebert , Yu-Xiong Wang

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

DiMo: Discrete Diffusion Modeling for Motion Generation and Understanding

Prior masked modeling motion generation methods predominantly study text-to-motion. We present DiMo, a discrete diffusion-style framework, which extends masked modeling to bidirectional text--motion understanding and generation. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2026-02-09 Ning Zhang , Zhengyu Li , Kwong Weng Loh , Mingxi Xu , Qi Wang , Zhengyu Wen , Xiaoyu He , Wei Zhao , Kehong Gong , Mingyuan Zhang

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Mixture-of-Experts (MoE) has emerged as a powerful paradigm for scaling model capacity while preserving computational efficiency. Despite its notable success in large language models (LLMs), existing attempts to apply MoE to Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Yujie Wei , Shiwei Zhang , Hangjie Yuan , Yujin Han , Zhekai Chen , Jiayu Wang , Difan Zou , Xihui Liu , Yingya Zhang , Yu Liu , Hongming Shan

Token Merging for Fast Stable Diffusion

The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Daniel Bolya , Judy Hoffman

DPBridge: Latent Diffusion Bridge for Dense Prediction

Diffusion models demonstrate remarkable capabilities in capturing complex data distributions and have achieved compelling results in many generative tasks. While they have recently been extended to dense prediction tasks such as depth…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Haorui Ji , Taojun Lin , Hongdong Li

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE2, have shown remarkable results on image synthesis. On the other hand, large-scale cross-modal pre-trained models (e.g., CLIP, ALIGN, and FILIP) are competent for…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Runhui Huang , Jianhua Han , Guansong Lu , Xiaodan Liang , Yihan Zeng , Wei Zhang , Hang Xu

DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise

Acoustic echo and background noise pose challenges on speech enhancement in hands-free systems and speakerphones. Discriminatively trained end-to-end methods represent a powerful solution for joint acoustic echo control (AEC) and denoising.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-12 Haljan Lugo Girao , Ernst Seidel , Pejman Mowlaee , Ziyue Zhao , Tim Fingscheidt

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Do text-free diffusion models learn discriminative visual representations?

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Soumik Mukhopadhyay , Matthew Gwilliam , Yosuke Yamaguchi , Vatsal Agarwal , Namitha Padmanabhan , Archana Swaminathan , Tianyi Zhou , Jun Ohya , Abhinav Shrivastava