Machine Learning · Computer Science
Parallel Long Short-Term Memory for Multi-stream Classification
Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès +1
2017-02-15
Distributed, Parallel, and Cluster Computing · Computer Science
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar +7
2020-06-30
Distributed, Parallel, and Cluster Computing · Computer Science
Efficient allocation of image recognition and LLM tasks on multi-GPU system
Marcin Lawenda, Krzesimir Samborski, Kyrylo Khloponin, Łukasz Szustak
2025-03-20
Distributed, Parallel, and Cluster Computing · Computer Science
pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations
Ruben Laso, Diego Krupitza, Sascha Hunold
2024-02-12
Machine Learning · Computer Science
Sequence Parallelism: Long Sequence Training from System Perspective
Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li +1
2022-05-24
Machine Learning · Computer Science
DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training
Hongkuan Zhou, Da Zheng, Xiang Song, George Karypis +1
2023-07-18
Computation and Language · Computer Science
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley +8
2021-08-25
Machine Learning · Computer Science
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang, Zilin Zhu, Shenggui Li, Hui Su +3
2022-11-11
Distributed, Parallel, and Cluster Computing · Computer Science
Data-parallel distributed training of very large models beyond GPU capacity
Samuel Matzek, Max Grossman, Minsik Cho, Anar Yusifov +2
2018-11-30
Distributed, Parallel, and Cluster Computing · Computer Science
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu +4
2025-09-22
Distributed, Parallel, and Cluster Computing · Computer Science
A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming
Demetrios Coutinho, Felipe O. Lins e Silva, Daniel Aloise, Samuel +1
2019-05-29
Machine Learning · Computer Science
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks
Enrico Meloni, Lapo Faggi, Simone Marullo, Alessandro Betti +3
2022-12-05
Distributed, Parallel, and Cluster Computing · Computer Science
Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
Mert Hidayetoglu, Aurick Qiao, Michael Wyatt, Jeff Rasley +2
2026-01-27
Distributed, Parallel, and Cluster Computing · Computer Science
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai +1
2024-09-27
Distributed, Parallel, and Cluster Computing · Computer Science
Parallel Delta-Stepping Algorithm for Shared Memory Architectures
M. Kranjčević, D. Palossi, S. Pintarelli
2017-02-21
Distributed, Parallel, and Cluster Computing · Computer Science
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
Bita Hasheminezhad, Shahrzad Shirzad, Nanmiao Wu, Patrick Diehl +2
2021-04-21
Distributed, Parallel, and Cluster Computing · Computer Science
TurboTransformers: An Efficient GPU Serving System For Transformer Models
Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou
2021-02-23
Machine Learning · Computer Science
SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang +5
2025-02-19
Distributed, Parallel, and Cluster Computing · Computer Science
Performance and Energy Consumption of Parallel Machine Learning Algorithms
Xidong Wu, Preston Brazzle, Stephen Cahoon
2023-05-02
Distributed, Parallel, and Cluster Computing · Computer Science
Ultra-Long Sequence Distributed Transformer
Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen +6
2023-11-09
Distributed, Parallel, and Cluster Computing · Computer Science
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao +9
2020-07-03
Distributed, Parallel, and Cluster Computing · Computer Science
Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization
Chong Wang, Nan Du, Tom Gunter, Tao Lei +7
2026-02-10