Related papers: Binary-Integer-Programming Based Algorithm for Exp…

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will…

Machine Learning · Computer Science 2024-08-29 Lean Wang , Huazuo Gao , Chenggang Zhao , Xu Sun , Damai Dai

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs…

Machine Learning · Computer Science 2026-04-22 Zhixiong Zhao , Zukang Xu , Zhixuan Chen , Dawei Yang

Load Balancing Mixture of Experts with Similarity Preserving Routers

Sparse Mixture of Experts (MoE) models offer a scalable and efficient architecture for training large neural networks by activating only a subset of parameters ("experts") for each input. A learned router computes a distribution over these…

Machine Learning · Computer Science 2025-10-14 Nabil Omi , Siddhartha Sen , Ali Farhadi

Fine-grained MoE Load Balancing with Linear Programming

Mixture-of-Experts (MoE) has emerged as a promising approach to scale up deep learning models due to its significant reduction in computational resources. However, the dynamic nature of MoE leads to load imbalance among experts, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-16 Chenqi Zhao , Wenfei Wu , Linhai Song , Yuchen Xu , Yitao Yuan

Advancing Expert Specialization for Better MoE

Mixture-of-Experts (MoE) models enable efficient scaling of large language models (LLMs) by activating only a subset of experts per input. However, we observe that the commonly used auxiliary load balancing loss often leads to expert…

Computation and Language · Computer Science 2026-01-27 Hongcan Guo , Haolang Lu , Guoshun Nan , Bolun Chu , Jialin Zhuang , Yuan Yang , Wenhao Che , Xinye Cao , Sicong Leng , Qimei Cui , Xudong Jiang

$\phi$-Balancing for Mixture-of-Experts Training

Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introducing bias…

Machine Learning · Computer Science 2026-05-18 Lizhang Chen , Jonathan Li , Qi Wang , Runlong Liao , Shuozhe Li , Chen Liang , Ni Lao , Qiang Liu

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts

Mixture-of-Experts (MoE) models are typically pre-trained with explicit load-balancing constraints to ensure statistically balanced expert routing. Despite this, we observe that even well-trained MoE models exhibit significantly imbalanced…

Machine Learning · Computer Science 2026-01-27 Xuan-Phi Nguyen , Shrey Pandit , Austin Xu , Caiming Xiong , Shafiq Joty

Training Experimentally Robust and Interpretable Binarized Regression Models Using Mixed-Integer Programming

In this paper, we explore model-based approach to training robust and interpretable binarized regression models for multiclass classification tasks using Mixed-Integer Programming (MIP). Our MIP model balances the optimization of prediction…

Machine Learning · Computer Science 2022-03-22 Sanjana Tule , Nhi Ha Lan Le , Buser Say

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and…

Artificial Intelligence · Computer Science 2026-05-15 Juntong Wu , Jialiang Cheng , Qishen Yin , Yue Dai , Yuliang Yan , Fuyu Lv , Ou Dan , Li Yuan

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Sparse Mixture-of-Experts (MoE) models scale capacity by routing each token to a small subset of experts. However, their routers exhibit a fundamental trade-off: strong load balancing can suppress expert specialization, while aggressive…

Machine Learning · Computer Science 2026-05-12 Gleb Molodtsov , Alexander Miasnikov , Aleksandr Beznosikov

ReaLB: Real-Time Load Balancing for Multimodal MoE Inference

Mixture-of-Experts (MoE) architectures are widely used in modern large language models and multimodal models. However, inference efficiency is often limited by highly dynamic and skewed expert workloads across different modalities. During…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Yingping Wang , Yi Wu , Xiangyu Wu , Junwei Cui , Weilin Cai , Zhijiang Guo , Jiayi Huang

Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently…

Machine Learning · Computer Science 2025-06-27 Jiajie Yang

MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing

Mixture-of-Experts (MoE) model architecture has emerged as a promising solution for scaling transformer models efficiently, offering sparse activation that reduces computational costs while increasing model capacity. However, as MoE models…

Machine Learning · Computer Science 2025-02-11 Seokjin Go , Divya Mahajan

From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing

Mixture-of-Experts (MoE) models can scale parameter capacity by routing each token to a subset of experts through a learned gate function. While conditional routing reduces training costs, it shifts the burden on inference memory: expert…

Machine Learning · Computer Science 2025-10-07 Rana Shahout , Colin Cai , Yilun Du , Minlan Yu , Michael Mitzenmacher

Slicing and Dicing: Configuring Optimal Mixtures of Experts

Mixture-of-Experts (MoE) architectures have become standard in large language models, yet many of their core design choices - expert count, granularity, shared experts, load balancing, token dropping - have only been studied one or two at a…

Machine Learning · Computer Science 2026-05-13 Margaret Li , Sneha Kudugunta , Danielle Rothermel , Luke Zettlemoyer

Routing-Free Mixture-of-Experts

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax,…

Machine Learning · Computer Science 2026-04-02 Yilun Liu , Jinru Han , Sikuan Yan , Volker Tresp , Yunpu Ma

Expert Divergence Learning for MoE-based Language Models

The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where experts learn redundant functionalities, thereby limiting MoE's full potential. To address…

Machine Learning · Computer Science 2026-03-03 Jiaang Li , Haibin Chen , Langming Liu , Yujin Yuan , Yadao Wang , Yizhen Zhang , Chengting Yu , Xin Tong , Weidong Zhang , Shilei Liu , Wenbo Su , Bo Zheng

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

Load imbalance is a long-standing challenge in Mixture-of-Experts (MoE) training and is exacerbated in reinforcement learning (RL) for LLMs, where hot experts can shift frequently across micro-batches. Existing MoE training systems rely on…

Machine Learning · Computer Science 2026-05-12 Chao Jin , Xinming Wei , Yinmin Zhong , Chengxu Yang , Bingyang Wu , Ruidong Zhu , Zili Zhang , Yuliang Liu , Xin Jin

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input…

Machine Learning · Computer Science 2024-03-13 Quzhe Huang , Zhenwei An , Nan Zhuang , Mingxu Tao , Chen Zhang , Yang Jin , Kun Xu , Kun Xu , Liwei Chen , Songfang Huang , Yansong Feng

MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models

Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and computation in an MoE model typically…

Machine Learning · Computer Science 2026-02-09 Nurbek Tastan , Stefanos Laskaridis , Karthik Nandakumar , Samuel Horvath