Related papers: Multi-Architecture Multi-Expert Diffusion Models
Recent efforts on Diffusion Mixture-of-Experts (MoE) models have primarily focused on developing more sophisticated routing mechanisms. However, we observe that the underlying architectural configuration space remains markedly…
Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new…
We propose the Mixture of Frozen Experts (MoFE) architecture, which integrates Parameter-efficient Fine-tuning (PEFT) and the Mixture of Experts (MoE) architecture to enhance both training efficiency and model scalability. By freezing the…
Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication bottlenecks. State-of-the-art methods…
Diffusion models have recently achieved impressive results in reconstructing images from noisy inputs, and similar ideas have been applied to speech enhancement by treating time-frequency representations as images. With the ubiquity of…
We propose \textbf{MoE-DiffuSeq}, a diffusion-based framework for efficient long-form text generation that integrates sparse attention with a Mixture-of-Experts (MoE) architecture. Existing sequence diffusion models suffer from prohibitive…
Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps…
Efficient inference is a critical challenge in deep generative modeling, particularly as diffusion models grow in capacity and complexity. While increased complexity often improves accuracy, it raises compute costs, latency, and memory…
Channel estimation is essential to massive multiple-input multiple-output (MIMO) systems. While recent generative model-based approaches using lightweight diffusion models (DMs) have achieved superior performance, they typically rely on a…
Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory…
Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM…
Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on…
Distributed adaptive networks achieve better estimation performance by exploiting temporal and as well spatial diversity while consuming few resources. Recent works have studied the single task distributed estimation problem, in which the…
Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the…
Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system…
Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their…
The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real…
Image classification serves as the cornerstone of computer vision, traditionally achieved through discriminative models based on deep neural networks. Recent advancements have introduced classification methods derived from generative…
The emergence of large-scale Mixture of Experts (MoE) models represents a significant advancement in artificial intelligence, offering enhanced model capacity and computational efficiency through conditional computation. However, deploying…
The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-task learning including low-level upstream tasks such as concurrent removal of multiple adverse weather effects. However, the conventional MoE…