English
Related papers

Related papers: Binary-Integer-Programming Based Algorithm for Exp…

200 papers

For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will…

Machine Learning · Computer Science 2024-08-29 Lean Wang , Huazuo Gao , Chenggang Zhao , Xu Sun , Damai Dai

Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs…

Machine Learning · Computer Science 2026-04-22 Zhixiong Zhao , Zukang Xu , Zhixuan Chen , Dawei Yang

Sparse Mixture of Experts (MoE) models offer a scalable and efficient architecture for training large neural networks by activating only a subset of parameters ("experts") for each input. A learned router computes a distribution over these…

Machine Learning · Computer Science 2025-10-14 Nabil Omi , Siddhartha Sen , Ali Farhadi

Mixture-of-Experts (MoE) has emerged as a promising approach to scale up deep learning models due to its significant reduction in computational resources. However, the dynamic nature of MoE leads to load imbalance among experts, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-16 Chenqi Zhao , Wenfei Wu , Linhai Song , Yuchen Xu , Yitao Yuan

Mixture-of-Experts (MoE) models enable efficient scaling of large language models (LLMs) by activating only a subset of experts per input. However, we observe that the commonly used auxiliary load balancing loss often leads to expert…

Computation and Language · Computer Science 2026-01-27 Hongcan Guo , Haolang Lu , Guoshun Nan , Bolun Chu , Jialin Zhuang , Yuan Yang , Wenhao Che , Xinye Cao , Sicong Leng , Qimei Cui , Xudong Jiang

Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introducing bias…

Machine Learning · Computer Science 2026-05-18 Lizhang Chen , Jonathan Li , Qi Wang , Runlong Liao , Shuozhe Li , Chen Liang , Ni Lao , Qiang Liu

Mixture-of-Experts (MoE) models are typically pre-trained with explicit load-balancing constraints to ensure statistically balanced expert routing. Despite this, we observe that even well-trained MoE models exhibit significantly imbalanced…

Machine Learning · Computer Science 2026-01-27 Xuan-Phi Nguyen , Shrey Pandit , Austin Xu , Caiming Xiong , Shafiq Joty

In this paper, we explore model-based approach to training robust and interpretable binarized regression models for multiclass classification tasks using Mixed-Integer Programming (MIP). Our MIP model balances the optimization of prediction…

Machine Learning · Computer Science 2022-03-22 Sanjana Tule , Nhi Ha Lan Le , Buser Say

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and…

Artificial Intelligence · Computer Science 2026-05-15 Juntong Wu , Jialiang Cheng , Qishen Yin , Yue Dai , Yuliang Yan , Fuyu Lv , Ou Dan , Li Yuan

Sparse Mixture-of-Experts (MoE) models scale capacity by routing each token to a small subset of experts. However, their routers exhibit a fundamental trade-off: strong load balancing can suppress expert specialization, while aggressive…

Machine Learning · Computer Science 2026-05-12 Gleb Molodtsov , Alexander Miasnikov , Aleksandr Beznosikov

Mixture-of-Experts (MoE) architectures are widely used in modern large language models and multimodal models. However, inference efficiency is often limited by highly dynamic and skewed expert workloads across different modalities. During…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Yingping Wang , Yi Wu , Xiangyu Wu , Junwei Cui , Weilin Cai , Zhijiang Guo , Jiayi Huang

Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently…

Machine Learning · Computer Science 2025-06-27 Jiajie Yang

Mixture-of-Experts (MoE) model architecture has emerged as a promising solution for scaling transformer models efficiently, offering sparse activation that reduces computational costs while increasing model capacity. However, as MoE models…

Machine Learning · Computer Science 2025-02-11 Seokjin Go , Divya Mahajan

Mixture-of-Experts (MoE) models can scale parameter capacity by routing each token to a subset of experts through a learned gate function. While conditional routing reduces training costs, it shifts the burden on inference memory: expert…

Machine Learning · Computer Science 2025-10-07 Rana Shahout , Colin Cai , Yilun Du , Minlan Yu , Michael Mitzenmacher

Mixture-of-Experts (MoE) architectures have become standard in large language models, yet many of their core design choices - expert count, granularity, shared experts, load balancing, token dropping - have only been studied one or two at a…

Machine Learning · Computer Science 2026-05-13 Margaret Li , Sneha Kudugunta , Danielle Rothermel , Luke Zettlemoyer

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax,…

Machine Learning · Computer Science 2026-04-02 Yilun Liu , Jinru Han , Sikuan Yan , Volker Tresp , Yunpu Ma

The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where experts learn redundant functionalities, thereby limiting MoE's full potential. To address…

Machine Learning · Computer Science 2026-03-03 Jiaang Li , Haibin Chen , Langming Liu , Yujin Yuan , Yadao Wang , Yizhen Zhang , Chengting Yu , Xin Tong , Weidong Zhang , Shilei Liu , Wenbo Su , Bo Zheng

Load imbalance is a long-standing challenge in Mixture-of-Experts (MoE) training and is exacerbated in reinforcement learning (RL) for LLMs, where hot experts can shift frequently across micro-batches. Existing MoE training systems rely on…

Machine Learning · Computer Science 2026-05-12 Chao Jin , Xinming Wei , Yinmin Zhong , Chengxu Yang , Bingyang Wu , Ruidong Zhu , Zili Zhang , Yuliang Liu , Xin Jin

In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input…

Machine Learning · Computer Science 2024-03-13 Quzhe Huang , Zhenwei An , Nan Zhuang , Mingxu Tao , Chen Zhang , Yang Jin , Kun Xu , Kun Xu , Liwei Chen , Songfang Huang , Yansong Feng

Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and computation in an MoE model typically…

Machine Learning · Computer Science 2026-02-09 Nurbek Tastan , Stefanos Laskaridis , Karthik Nandakumar , Samuel Horvath
‹ Prev 1 2 3 10 Next ›