Related papers: Adaptive Serverless Learning

DePRL: Achieving Linear Convergence Speedup in Personalized Decentralized Learning with Shared Representations

Decentralized learning has emerged as an alternative method to the popular parameter-server framework which suffers from high communication burden, single-point failure and scalability issues due to the need of a central server. However,…

Machine Learning · Computer Science 2023-12-19 Guojun Xiong , Gang Yan , Shiqiang Wang , Jian Li

Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Yujing Ma , Florin Rusu , Kesheng Wu , Alexander Sim

Periodic Stochastic Gradient Descent with Momentum for Decentralized Training

Decentralized training has been actively studied in recent years. Although a wide variety of methods have been proposed, yet the decentralized momentum SGD method is still underexplored. In this paper, we propose a novel periodic…

Machine Learning · Computer Science 2020-08-25 Hongchang Gao , Heng Huang

SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

Decentralized training enables learning with distributed datasets generated at different locations without relying on a central server. In realistic scenarios, the data distribution across these sparsely connected learning agents can be…

Machine Learning · Computer Science 2025-02-27 Sakshi Choudhary , Sai Aparna Aketi , Kaushik Roy

Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View

Deep learning has become an indispensable part of life, such as face recognition, NLP, etc., but the training of deep model has always been a challenge, and in recent years, the complexity of training data and models has shown explosive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-18 Sheng Huang

Efficient Decentralized Deep Learning by Dynamic Model Averaging

We propose an efficient protocol for decentralized training of deep neural networks from distributed data sources. The proposed protocol allows to handle different phases of model training equally well and to quickly adapt to concept…

Machine Learning · Computer Science 2018-11-14 Michael Kamp , Linara Adilova , Joachim Sicking , Fabian Hüger , Peter Schlicht , Tim Wirtz , Stefan Wrobel

Asynchronous Decentralized Distributed Training of Acoustic Models

Large-scale distributed training of deep acoustic models plays an important role in today's high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training…

Computation and Language · Computer Science 2021-10-22 Xiaodong Cui , Wei Zhang , Abdullah Kayi , Mingrui Liu , Ulrich Finkler , Brian Kingsbury , George Saon , David Kung

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Privacy-Preserving Serverless Edge Learning with Decentralized Small Data

In the last decade, data-driven algorithms outperformed traditional optimization-based algorithms in many research areas, such as computer vision, natural language processing, etc. However, extensive data usages bring a new challenge or…

Machine Learning · Computer Science 2021-12-02 Shih-Chun Lin , Chia-Hung Lin

Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

Distributed training algorithms of deep neural networks show impressive convergence speedup properties on very large problems. However, they inherently suffer from communication related slowdowns and communication topology becomes a crucial…

Machine Learning · Computer Science 2022-03-25 Tomer Avidor , Nadav Tal Israel

FedSPD: A Soft-clustering Approach for Personalized Decentralized Federated Learning

Federated learning has recently gained popularity as a framework for distributed clients to collaboratively train a machine learning model using local data. While traditional federated learning relies on a central server for model…

Machine Learning · Computer Science 2025-09-03 I-Cheng Lin , Osman Yagan , Carlee Joe-Wong

On the Convergence of Decentralized Adaptive Gradient Methods

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

Decentralized Deep Learning with Arbitrary Communication Compression

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches suffer from limited bandwidth…

Machine Learning · Computer Science 2020-11-12 Anastasia Koloskova , Tao Lin , Sebastian U. Stich , Martin Jaggi

Scheduling and Communication Schemes for Decentralized Federated Learning

Federated learning (FL) is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. One central server is not enough, due to…

Machine Learning · Computer Science 2023-11-28 Bahaa-Eldin Ali Abdelghany , Ana Fernández-Vilas , Manuel Fernández-Veiga , Nashwa El-Bendary , Ammar M. Hassan , Walid M. Abdelmoez

Learning Rate Adaptation for Federated and Differentially Private Learning

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate…

Machine Learning · Statistics 2020-08-28 Antti Koskela , Antti Honkela

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning,…

Machine Learning · Computer Science 2023-09-19 Hao Sun , Li Shen , Shixiang Chen , Jingwei Sun , Jing Li , Guangzhong Sun , Dacheng Tao

Distributed Federated Learning by Alternating Periods of Training

Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, the dependence on a single central server is…

Machine Learning · Computer Science 2026-01-06 Shamik Bhattacharyya , Rachel Kalpana Kalaimani

Communication-Efficient Federated Deep Learning with Asynchronous Model Update and Temporally Weighted Aggregation

Federated learning obtains a central model on the server by aggregating models trained locally on clients. As a result, federated learning does not require clients to upload their data to the server, thereby preserving the data privacy of…

Machine Learning · Computer Science 2020-08-31 Yang Chen , Xiaoyan Sun , Yaochu Jin