Related papers: Multi-Architecture Multi-Expert Diffusion Models

Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

Recent efforts on Diffusion Mixture-of-Experts (MoE) models have primarily focused on developing more sophisticated routing mechanisms. However, we observe that the underlying architectural configuration space remains markedly…

Machine Learning · Computer Science 2025-12-02 Yahui Liu , Yang Yue , Jingyuan Zhang , Chenxi Sun , Yang Zhou , Wencong Zeng , Ruiming Tang , Guorui Zhou

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new…

Computer Vision and Pattern Recognition · Computer Science 2026-02-13 Huijie Zhang , Yifu Lu , Ismail Alkhouri , Saiprasad Ravishankar , Dogyoon Song , Qing Qu

MoFE: Mixture of Frozen Experts Architecture

We propose the Mixture of Frozen Experts (MoFE) architecture, which integrates Parameter-efficient Fine-tuning (PEFT) and the Mixture of Experts (MoE) architecture to enhance both training efficiency and model scalability. By freezing the…

Computation and Language · Computer Science 2025-03-11 Jean Seo , Jaeyoon Kim , Hyopil Shin

Staleness-Centric Optimizations for Parallel Diffusion MoE Inference

Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication bottlenecks. State-of-the-art methods…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-01 Jiajun Luo , Lizhuo Luo , Jianru Xu , Jiajun Song , Rongwei Lu , Chen Tang , Zhi Wang

AMDM-SE: Attention-based Multichannel Diffusion Model for Speech Enhancement

Diffusion models have recently achieved impressive results in reconstructing images from noisy inputs, and similar ideas have been applied to speech enhancement by treating time-frequency representations as images. With the ubiquity of…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Renana Opochinsky , Sharon Gannot

MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts

We propose \textbf{MoE-DiffuSeq}, a diffusion-based framework for efficient long-form text generation that integrates sparse attention with a Mixture-of-Experts (MoE) architecture. Existing sequence diffusion models suffer from prohibitive…

Computation and Language · Computer Science 2026-01-08 Alexandros Christoforos , Chadbourne Davis

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps…

Machine Learning · Computer Science 2025-03-26 Kai Wang , Mingjia Shi , Yukun Zhou , Zekai Li , Zhihang Yuan , Yuzhang Shang , Xiaojiang Peng , Hanwang Zhang , Yang You

Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark

Efficient inference is a critical challenge in deep generative modeling, particularly as diffusion models grow in capacity and complexity. While increased complexity often improves accuracy, it raises compute costs, latency, and memory…

Machine Learning · Computer Science 2025-09-24 Siu Hang Ho , Prasad Ganesan , Nguyen Duong , Daniel Schlabig

Mixture-of-Experts Diffusion Models for Adaptive Massive MIMO Channel Estimation via Variational Bayesian Inference

Channel estimation is essential to massive multiple-input multiple-output (MIMO) systems. While recent generative model-based approaches using lightweight diffusion models (DMs) have achieved superior performance, they typically rely on a…

Signal Processing · Electrical Eng. & Systems 2026-05-19 Zhuorui Jiang , Jun Fang , Boyu Ning , Hongbin Li , Ying-Chang Liang

Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling

Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory…

Hardware Architecture · Computer Science 2026-03-31 Songchen Ma , Hongyi Li , Weihao Zhang , Yonghao Tan , Pingcheng Dong , Yu Liu , Lan Liu , Yuzhong Jiao , Xuejiao Liu , Luhong Liang , Kwang-Ting Cheng

Analyzing and Improving the Training Dynamics of Diffusion Models

Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Tero Karras , Miika Aittala , Jaakko Lehtinen , Janne Hellsten , Timo Aila , Samuli Laine

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on…

Machine Learning · Computer Science 2025-04-18 Kafeng Wang , Jianfei Chen , He Li , Zhenpeng Mi , Jun Zhu

Diffusion Adaptation Over Clustered Multitask Networks Based on the Affine Projection Algorithm

Distributed adaptive networks achieve better estimation performance by exploiting temporal and as well spatial diversity while consuming few resources. Recent works have studied the single task distributed estimation problem, in which the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-02 Vinay Chakravarthi Gogineni , Mrityunjoy Chakraborty

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning

Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Qianli Ma , Xuefei Ning , Dongrui Liu , Li Niu , Linfeng Zhang

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models

Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-05 Yuchen Xia , Divyam Sharma , Yichao Yuan , Souvik Kundu , Nishil Talati

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their…

Machine Learning · Computer Science 2024-12-18 Moritz Reuss , Jyothish Pari , Pulkit Agrawal , Rudolf Lioutikov

DiffETM: Diffusion Process Enhanced Embedded Topic Model

The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real…

Computation and Language · Computer Science 2025-01-03 Wei Shao , Mingyang Liu , Linqi Song

An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques

Image classification serves as the cornerstone of computer vision, traditionally achieved through discriminative models based on deep neural networks. Recent advancements have introduced classification methods derived from generative…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Chunxiao Li , Xiaoxiao Wang , Boming Miao , Chuanlong Xie , Zizhe Wang , Yao Zhu

A Survey on Inference Optimization Techniques for Mixture of Experts Models

The emergence of large-scale Mixture of Experts (MoE) models represents a significant advancement in artificial intelligence, offering enhanced model capacity and computational efficiency through conditional computation. However, deploying…

Machine Learning · Computer Science 2025-01-23 Jiacheng Liu , Peng Tang , Wenfeng Wang , Yuhang Ren , Xiaofeng Hou , Pheng-Ann Heng , Minyi Guo , Chao Li

Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-task learning including low-level upstream tasks such as concurrent removal of multiple adverse weather effects. However, the conventional MoE…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Rongyu Zhang , Yulin Luo , Jiaming Liu , Huanrui Yang , Zhen Dong , Denis Gudovskiy , Tomoyuki Okuno , Yohei Nakata , Kurt Keutzer , Yuan Du , Shanghang Zhang