Related papers: Asynchronous Parallel Incremental Block-Coordinate…

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework

Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and…

Machine Learning · Computer Science 2025-05-16 Yijie Zhou , Shi Pu

Distributed Machine Learning through Heterogeneous Edge Systems

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Hanpeng Hu , Dan Wang , Chuan Wu

Large problems are not necessarily hard: A case study on distributed NMPC paying off

A key motivation in the development of Distributed Model Predictive Control (DMPC) is to accelerate centralized Model Predictive Control (MPC) for large-scale systems. DMPC has the prospect of scaling well by parallelizing computations…

Optimization and Control · Mathematics 2025-04-16 Gösta Stomberg , Maurice Raetsch , Alexander Engelmann , Timm Faulwasser

Accelerating Optimization and Machine Learning through Decentralization

Decentralized optimization enables multiple devices to learn a global machine learning model while each individual device only has access to its local dataset. By avoiding the need for training data to leave individual users' devices, it…

Machine Learning · Computer Science 2026-04-22 Ziqin Chen , Zuang Wang , Yongqiang Wang

Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models

In federated learning, models are learned from users' data that are held private in their edge devices, by aggregating them in the service provider's "cloud" to obtain a global model. Such global model is of great commercial value in, e.g.,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-02 Ruiyuan Wu , Anna Scaglione , Hoi-To Wai , Nurullah Karakoc , Kari Hreinsson , Wing-Kin Ma

Decentralised and collaborative machine learning framework for IoT

Decentralised machine learning has recently been proposed as a potential solution to the security issues of the canonical federated learning approach. In this paper, we propose a decentralised and collaborative machine learning framework…

Machine Learning · Computer Science 2023-12-20 Martín González-Soto , Rebeca P. Díaz-Redondo , Manuel Fernández-Veiga , Bruno Rodríguez-Castro , Ana Fernández-Vilas

FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning

Although Federated Learning has been widely studied in recent years, there are still high overhead expenses in each communication round for large-scale models such as Vision Transformer. To lower the communication complexity, we propose a…

Machine Learning · Computer Science 2026-04-21 Junkang Liu , Fanhua Shang , Yuanyuan Liu , Hongying Liu , Yuangang Li , YunXiang Gong

Scaling Distributed Machine Learning with In-Network Aggregation

Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan R. K. Ports , Peter Richtárik

Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

Communication-Efficient Learning of Deep Networks from Decentralized Data

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image…

Machine Learning · Computer Science 2023-01-30 H. Brendan McMahan , Eider Moore , Daniel Ramage , Seth Hampson , Blaise Agüera y Arcas

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes the training of a network by splitting it depth-wise into multiple modules. Techniques from decoupled learning usually lead to stale gradient effect because of their…

Machine Learning · Computer Science 2020-12-08 Huiping Zhuang , Zhiping Lin , Kar-Ann Toh

A Survey on Distributed Machine Learning

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase…

Machine Learning · Computer Science 2022-11-28 Joost Verbraeken , Matthijs Wolting , Jonathan Katzy , Jeroen Kloppenburg , Tim Verbelen , Jan S. Rellermeyer

Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing

Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-05 Hao Chen , Yu Ye , Ming Xiao , Mikael Skoglund , H. Vincent Poor

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

The deployment of large-scale models, such as large language models (LLMs), incurs substantial costs due to their computational demands. To mitigate these costs and address challenges related to scalability and data security, there is a…

Machine Learning · Computer Science 2025-08-14 Yipeng Du , Zihao Wang , Ahmad Farhan , Claudio Angione , Harry Yang , Fielding Johnston , James P. Buban , Patrick Colangelo , Yue Zhao , Yuzhe Yang

Block Decomposable Methods for Large-Scale Optimization Problems

This dissertation explores block decomposable methods for large-scale optimization problems. It focuses on alternating direction method of multipliers (ADMM) schemes and block coordinate descent (BCD) methods. Specifically, it introduces a…

Optimization and Control · Mathematics 2026-01-15 Leandro Farias Maia

Outlook Towards Deployable Continual Learning for Particle Accelerators

Particle Accelerators are high power complex machines. To ensure uninterrupted operation of these machines, thousands of pieces of equipment need to be synchronized, which requires addressing many challenges including design, optimization…

Machine Learning · Computer Science 2025-04-08 Kishansingh Rajput , Sen Lin , Auralee Edelen , Willem Blokland , Malachi Schram

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and…

Machine Learning · Computer Science 2024-05-27 Madison Threadgill , Andreas Gerstlauer

Leveraging The Edge-to-Cloud Continuum for Scalable Machine Learning on Decentralized Data

With mobile, IoT and sensor devices becoming pervasive in our life and recent advances in Edge Computational Intelligence (e.g., Edge AI/ML), it became evident that the traditional methods for training AI/ML models are becoming obsolete,…

Machine Learning · Computer Science 2023-06-21 Ahmed M. Abdelmoniem