Machine Learning · Computer Science
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele
2022-07-04
Distributed, Parallel, and Cluster Computing · Computer Science
Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems
Haowei Yang, Yu Tian, Zhongheng Yang, Zhao Wang +2
2025-06-25
Distributed, Parallel, and Cluster Computing · Computer Science
Two-dimensional Sparse Parallelism for Large Scale Deep Learning Recommendation Model Training
Xin Zhang, Quanyu Zhu, Liangbei Xu, Zain Huda +7
2025-08-07
Distributed, Parallel, and Cluster Computing · Computer Science
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia, Matei Zaharia, Alex Aiken
2018-07-23
Distributed, Parallel, and Cluster Computing · Computer Science
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu +4
2025-09-22
Machine Learning · Computer Science
Parallel training of linear models without compromising convergence
Nikolas Ioannou, Celestine Dünner, Kornilios Kourtis, Thomas Parnell
2018-12-20
Distributed, Parallel, and Cluster Computing · Computer Science
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training
Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker +4
2023-10-12
Performance · Computer Science
A Comparative Measurement Study of Deep Learning as a Service Framework
Yanzhao Wu, Ling Liu, Calton Pu, Wenqi Cao +3
2019-08-20
Distributed, Parallel, and Cluster Computing · Computer Science
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
Moiz Arif, Avinash Maurya, Sudharshan Vazhkudai, Bogdan Nicolae
2026-05-20
Machine Learning · Computer Science
Learning to Optimize Tensor Programs
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang +4
2019-01-10
Computation and Language · Computer Science
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley +8
2021-08-25
Programming Languages · Computer Science
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning
Ningning Xie, Tamara Norman, Dominik Grewe, Dimitrios Vytiniotis
2021-11-17
Machine Learning · Computer Science
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Ruifeng She, Bowen Pang, Kai Li, Zehua Liu +1
2025-03-13
Distributed, Parallel, and Cluster Computing · Computer Science
Using Meta-heuristics and Machine Learning for Software Optimization of Parallel Computing Systems: A Systematic Literature Review
Suejb Memeti, Sabri Pllana, Alecio Binotto, Joanna Kolodziej +1
2018-05-03
Distributed, Parallel, and Cluster Computing · Computer Science
tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads
Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis
2021-07-05
Distributed, Parallel, and Cluster Computing · Computer Science
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach
Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang +2
2020-03-10
Distributed, Parallel, and Cluster Computing · Computer Science
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho +5
2019-06-11
Machine Learning · Computer Science
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, Yaosheng Fu +4
2022-11-08
Distributed, Parallel, and Cluster Computing · Computer Science
The Case for Co-Designing Model Architectures with Hardware
Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman +5
2024-02-01
Distributed, Parallel, and Cluster Computing · Computer Science
Characterizing Communication Patterns in Distributed Large Language Model Inference
Lang Xu, Kaushik Kandadi Suresh, Quentin Anthony, Nawras Alnaasan +1
2025-07-22
Distributed, Parallel, and Cluster Computing · Computer Science
Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks
Burak Topcu, Musa Oguzhan Cim, Poovaiah Palangappa, Meena Arunachalam +1
2026-03-09
Distributed, Parallel, and Cluster Computing · Computer Science
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou, Chun-Yen Chen, Jui-Lin Wu, Chun-Nan Chou +5
2017-09-21
Machine Learning · Computer Science
Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks
Vahid Partovi Nia, Alireza Ghaffari, Mahdi Zolnouri, Yvon Savaria
2022-09-23
Distributed, Parallel, and Cluster Computing · Computer Science
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu, Fangyu Wang, Zhiwei Xu, Fei Yang +1
2024-07-03