Machine Learning · Computer Science
A Survey on Inference Optimization Techniques for Mixture of Experts Models
Jiacheng Liu, Peng Tang, Wenfeng Wang, Yuhang Ren +4
2025-01-23
Computer Vision and Pattern Recognition · Computer Science
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia, Xuefeng Xiao +1
2025-10-13
Computer Vision and Pattern Recognition · Computer Science
Effective Quantization for Diffusion Models on CPUs
Hanwen Chang, Haihao Shen, Yiyang Cai, Xinyu Ye +6
2023-11-30
Machine Learning · Computer Science
DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference
Farhana Amin, Sabiha Afroz, Kanchon Gharami, Mona Moghadampanah +1
2025-11-17
Computation and Language · Computer Science
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang, Luohe Shi, Qiwei Li, Zuchao Li +4
2025-05-07
Computer Vision and Pattern Recognition · Computer Science
Accelerating Diffusion Transformer via Error-Optimized Cache
Junxiang Qiu, Shuo Wang, Jinda Lu, Lin Liu +3
2025-07-21
Computer Vision and Pattern Recognition · Computer Science
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu +3
2025-06-13
Machine Learning · Computer Science
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang +4
2022-07-25
Computation and Language · Computer Science
Efficient Inference For Neural Machine Translation
Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin
2020-10-08
Machine Learning · Computer Science
Ultra-Sparse Memory Network
Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu +3
2025-02-07
Computer Vision and Pattern Recognition · Computer Science
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li +1
2024-09-10
Computer Vision and Pattern Recognition · Computer Science
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
Senmao Li, Taihang Hu, Joost van de Weijer, Fahad Shahbaz Khan +6
2024-10-16
Machine Learning · Computer Science
Accelerated AI Inference via Dynamic Execution Methods
Haim Barad, Jascha Achterberg, Tien Pei Chou, Jean Yu
2024-11-05
Computer Vision and Pattern Recognition · Computer Science
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh, Yan Kang, Yuchen Liu, Richard Zhang +2
2024-09-25
Machine Learning · Computer Science
SparseDM: Toward Sparse Efficient Diffusion Models
Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi +1
2025-04-18
Machine Learning · Computer Science
A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation
Jiacheng Liu, Xinyu Wang, Yuqi Lin, Zhikai Wang +9
2025-11-04
Machine Learning · Computer Science
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan +7
2022-07-04
Machine Learning · Computer Science
Efficiently Scaling Transformer Inference
Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin +6
2022-11-10
Machine Learning · Computer Science
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
Yahui Liu, Yang Yue, Jingyuan Zhang, Chenxi Sun +4
2025-12-02
Computer Vision and Pattern Recognition · Computer Science
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
Haowei Zhu, Ji Liu, Ziqiong Liu, Dong Li +3
2026-04-07
Machine Learning · Computer Science
Fast MoE Inference via Predictive Prefetching and Expert Replication
Ankit Jyothish, Ali Jannesari, Aishwarya Sarkar, Joseph Zuber
2026-05-13
Computer Vision and Pattern Recognition · Computer Science
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan, Malcolm Chadwick +4
2025-08-12
Distributed, Parallel, and Cluster Computing · Computer Science
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke +5
2023-06-21