Related papers: Network-accelerated Distributed Machine Learning U…

Data optimization for large batch distributed training of deep neural networks

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…

Machine Learning · Computer Science 2020-12-21 Shubhankar Gahlot , Junqi Yin , Mallikarjun Shankar

LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models

As model sizes in machine learning continue to scale, distributed training is necessary to accommodate model weights within each device and to reduce training time. However, this comes with the expense of increased communication overhead…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-15 William Won , Saeed Rashidi , Sudarshan Srinivasan , Tushar Krishna

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Multimodal Large Language Models (MLLMs) have achieved remarkable advances by integrating text, image, and audio understanding within a unified architecture. However, existing distributed training frameworks remain fundamentally data-blind:…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-20 Hyeonjun An , Sihyun Kim , Chaerim Lim , Hyunjoon Kim , Rathijit Sen , Sangmin Jung , Hyeonsoo Lee , Dongwook Kim , Takki Yu , Jinkyu Jeong , Youngsok Kim , Kwanghyun Park

Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol

Distributed Machine Learning (DML) systems are utilized to enhance the speed of model training in data centers (DCs) and edge nodes. The Parameter Server (PS) communication architecture is commonly employed, but it faces severe long-tail…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-15 Zixuan Chen , Lei Shi , Xuandong Liu , Xin Ai , Sen Liu , Yang Xu

A Novel Coded Computing Approach for Distributed Multi-Task Learning

Distributed multi-task learning (DMTL) effectively improves model generalization performance through the collaborative training of multiple related models. However, in large-scale learning scenarios, communication bottlenecks severely limit…

Information Theory · Computer Science 2025-07-25 Minquan Cheng , Yongkang Wang , Lingyu Zhang , Youlong Wu

DLL: A Blazing Fast Deep Neural Network Library

Deep Learning Library (DLL) is a new library for machine learning with deep neural networks that focuses on speed. It supports feed-forward neural networks such as fully-connected Artificial Neural Networks (ANNs) and Convolutional Neural…

Machine Learning · Computer Science 2018-04-15 Baptiste Wicht , Jean Hennebert , Andreas Fischer

Accelerating Divisible Load Processing Through Machine Learning: A Practical Framework for Large-Scale Workloads

In this paper, we introduce the first machine learning framework for predicting optimal processing times in Single-Level Tree Network (SLTN) architectures for the Divisible Load Theory (DLT) paradigm. Using a feedforward neural network(FNN)…

Machine Learning · Computer Science 2026-05-25 Bharadwaj Veeravalli

DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training

Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models. While prior work has improved training efficiency from the ML perspective at the application layer, it…

Machine Learning · Computer Science 2026-05-05 Zechen Ma , Zixi Qu , Jinyan Yi , David Lin , Yashar Ganjali

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-04 Zhenheng Tang , Shaohuai Shi , Wei Wang , Bo Li , Xiaowen Chu

Understanding and Accelerating the Training of Masked Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling…

Machine Learning · Computer Science 2026-05-14 Chunsan Hong , Sanghyun Lee , Chieh-Hsin Lai , Satoshi Hayakawa , Yuhta Takida , Yuki Mitsufuji , Seungryong Kim , Jong Chul Ye

Dependable Distributed Training of Compressed Machine Learning Models

The existing work on the distributed training of machine learning (ML) models has consistently overlooked the distribution of the achieved learning quality, focusing instead on its average value. This leads to a poor dependability}of the…

Machine Learning · Computer Science 2024-02-23 Francesco Malandrino , Giuseppe Di Giacomo , Marco Levorato , Carla Fabiana Chiasserini

Delay-Aware Hierarchical Federated Learning

Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning…

Machine Learning · Computer Science 2023-09-29 Frank Po-Chen Lin , Seyyedali Hosseinalipour , Nicolò Michelusi , Christopher Brinton

Efficient Distributed MLLM Training with Cornstarch

Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Insu Jang , Runyu Lu , Nikhil Bansal , Ang Chen , Mosharaf Chowdhury

Machine Learning for Networking: Workflow, Advances and Opportunities

Recently, machine learning has been used in every possible field to leverage its amazing power. For a long time, the net-working and distributed computing system is the key infrastructure to provide efficient computational resource for…

Networking and Internet Architecture · Computer Science 2017-11-17 Mowei Wang , Yong Cui , Xin Wang , Shihan Xiao , Junchen Jiang

Scaling Distributed Machine Learning with In-Network Aggregation

Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan R. K. Ports , Peter Richtárik

MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection

Choosing appropriate fabrics is critical for meeting functional and quality demands in robotic textile manufacturing, apparel production, and smart retail. We propose MLLM-Fabric, a robotic framework leveraging multimodal large language…

Robotics · Computer Science 2025-10-14 Liman Wang , Hanyang Zhong , Tianyuan Wang , Shan Luo , Jihong Zhu

Benchmark Assessment for DeepSpeed Optimization Library

Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models…

Machine Learning · Computer Science 2022-02-28 Gongbo Liang , Izzat Alsmadi

Distributed Learning over Unreliable Networks

Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-17 Chen Yu , Hanlin Tang , Cedric Renggli , Simon Kassing , Ankit Singla , Dan Alistarh , Ce Zhang , Ji Liu

Enabling Fast and Flexible Distributed Deep Learning with Programmable Switches

Deep learning has been used in a wide range of areas and made a huge breakthrough. With the ever-increasing model size and train-ing data volume, distributed deep learning emerges which utilizes a cluster to train a model in parallel.…

Networking and Internet Architecture · Computer Science 2022-08-11 Heng Pan , Penglai Cui , Zhenyu li , Ru Jia , Penghao Zhang , Leilei Zhang , Ye Yang , Jiahao Wu , Jianbo Dong , Zheng Cao , Qiang Li , Hongqiang Harry Liu , Mathy Laurent , Gaogang Xie

Distributed Machine Learning via Sufficient Factor Broadcasting

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML…

Machine Learning · Computer Science 2015-11-30 Pengtao Xie , Jin Kyu Kim , Yi Zhou , Qirong Ho , Abhimanu Kumar , Yaoliang Yu , Eric Xing