English
Related papers

Related papers: DiffMoE: Dynamic Token Selection for Scalable Diff…

200 papers

In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference. The DiT-MoE includes two simple designs: shared expert…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Zhengcong Fei , Mingyuan Fan , Changqian Yu , Debang Li , Junshi Huang

Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xiaoping Wu , Jie Hu , Xiaoming Wei

Generating high-quality labeled image datasets is crucial for training accurate and robust machine learning models in the field of computer vision. However, the process of manually labeling real images is often time-consuming and costly. To…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Michael Shenoda , Edward Kim

We aim to leverage diffusion to address the challenging image matting task. However, the presence of high computational overhead and the inconsistency of noise sampling between the training and inference processes pose significant obstacles…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Yihan Hu , Yiheng Lin , Wei Wang , Yao Zhao , Yunchao Wei , Humphrey Shi

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand,…

Machine Learning · Computer Science 2022-11-30 Kushagra Pandey , Avideep Mukherjee , Piyush Rai , Abhishek Kumar

Image classification serves as the cornerstone of computer vision, traditionally achieved through discriminative models based on deep neural networks. Recent advancements have introduced classification methods derived from generative…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Chunxiao Li , Xiaoxiao Wang , Boming Miao , Chuanlong Xie , Zizhe Wang , Yao Zhu

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

Mixture-of-Experts (MoE) architectures have emerged as a powerful paradigm for scaling neural networks while maintaining computational efficiency. However, standard MoE implementations rely on two rigid design assumptions: (1) fixed Top-K…

Machine Learning · Computer Science 2026-03-03 Gökdeniz Gülmez

Diffusion Transformer (DiT) has demonstrated remarkable performance in text-to-image generation; however, its large parameter size results in substantial inference overhead. Existing parameter compression methods primarily focus on pruning,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-13 Youwei Zheng , Yuxi Ren , Xin Xia , Xuefeng Xiao , Xiaohua Xie

Beyond high-fidelity image synthesis, diffusion models have recently exhibited promising results in dense visual perception tasks. However, most existing work treats diffusion models as a standalone component for perception tasks, employing…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Shuhong Zheng , Zhipeng Bao , Ruoyu Zhao , Martial Hebert , Yu-Xiong Wang

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Prior masked modeling motion generation methods predominantly study text-to-motion. We present DiMo, a discrete diffusion-style framework, which extends masked modeling to bidirectional text--motion understanding and generation. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2026-02-09 Ning Zhang , Zhengyu Li , Kwong Weng Loh , Mingxi Xu , Qi Wang , Zhengyu Wen , Xiaoyu He , Wei Zhao , Kehong Gong , Mingyuan Zhang

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

Mixture-of-Experts (MoE) has emerged as a powerful paradigm for scaling model capacity while preserving computational efficiency. Despite its notable success in large language models (LLMs), existing attempts to apply MoE to Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Yujie Wei , Shiwei Zhang , Hangjie Yuan , Yujin Han , Zhekai Chen , Jiayu Wang , Difan Zou , Xihui Liu , Yingya Zhang , Yu Liu , Hongming Shan

The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Daniel Bolya , Judy Hoffman

Diffusion models demonstrate remarkable capabilities in capturing complex data distributions and have achieved compelling results in many generative tasks. While they have recently been extended to dense prediction tasks such as depth…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Haorui Ji , Taojun Lin , Hongdong Li

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE2, have shown remarkable results on image synthesis. On the other hand, large-scale cross-modal pre-trained models (e.g., CLIP, ALIGN, and FILIP) are competent for…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Runhui Huang , Jianhua Han , Guansong Lu , Xiaodan Liang , Yihan Zeng , Wei Zhang , Hang Xu

Acoustic echo and background noise pose challenges on speech enhancement in hands-free systems and speakerphones. Discriminatively trained end-to-end methods represent a powerful solution for joint acoustic echo control (AEC) and denoising.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-12 Haljan Lugo Girao , Ernst Seidel , Pejman Mowlaee , Ziyue Zhao , Tim Fingscheidt

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Soumik Mukhopadhyay , Matthew Gwilliam , Yosuke Yamaguchi , Vatsal Agarwal , Namitha Padmanabhan , Archana Swaminathan , Tianyi Zhou , Jun Ohya , Abhinav Shrivastava
‹ Prev 1 2 3 10 Next ›