English
Related papers

Related papers: Parallel SGD: When does averaging help?

200 papers

In distributed training of deep neural networks, parallel mini-batch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradient in parallel, aggregates all…

Optimization and Control · Mathematics 2018-11-19 Hao Yu , Sen Yang , Shenghuo Zhu

Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the…

Machine Learning · Computer Science 2022-01-12 Sunwoo Lee , Anit Kumar Sahu , Chaoyang He , Salman Avestimehr

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function…

Machine Learning · Statistics 2023-11-02 Xi Chen , Jason D. Lee , Xin T. Tong , Yichen Zhang

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. However, the averaging technique, which combines all iterative…

Machine Learning · Computer Science 2020-05-28 Zhishuai Guo , Yan Yan , Tianbao Yang

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

The growing interest for high dimensional and functional data analysis led in the last decade to an important research developing a consequent amount of techniques. Parallelized algorithms, which consist in distributing and treat the data…

Statistics Theory · Mathematics 2017-10-24 Antoine Godichon-Baggioni , Sofiane Saadane

Stochastic gradient descent (SGD) is the workhorse of modern machine learning. Sometimes, there are many different potential gradient estimators that can be used. When so, choosing the one with the best tradeoff between cost and variance is…

Machine Learning · Computer Science 2020-10-23 Tomas Geffner , Justin Domke

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a solution with good generalization performance; such implicit bias is often characterized in terms of the sharpness of the minima. Kleinberg et…

Machine Learning · Statistics 2024-05-28 Atsushi Nitanda , Ryuhei Kikuchi , Shugo Maeda , Denny Wu

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-11 Karl Bäckström , Marina Papatriantafilou , Philippas Tsigas

In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to…

Machine Learning · Computer Science 2022-10-11 Vivak Patel , Shushu Zhang , Bowen Tian

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer

We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale…

Machine Learning · Computer Science 2023-11-01 Rustem Islamov , Mher Safaryan , Dan Alistarh

In this dissertation we propose alternative analysis of distributed stochastic gradient descent (SGD) algorithms that rely on spectral properties of the data covariance. As a consequence we can relate questions pertaining to speedups and…

Optimization and Control · Mathematics 2016-09-03 Avleen S. Bijral

We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network…

Optimization and Control · Mathematics 2016-09-02 Avleen S. Bijral , Anand D. Sarwate , Nathan Srebro

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

In this paper, we explore techniques centered around periodic sampling of model weights that provide convergence improvements on gradient update methods (vanilla \acs{SGD}, Momentum, Adam) for a variety of vision problems (classification,…

Machine Learning · Computer Science 2020-03-23 Samarth Tripathi , Jiayi Liu , Unmesh Kurup , Mohak Shah , Sauptik Dhar

We propose Multi-Level Local SGD, a distributed gradient method for learning a smooth, non-convex objective in a heterogeneous multi-level network. Our network model consists of a set of disjoint sub-networks, with a single hub and multiple…

Machine Learning · Computer Science 2022-02-21 Timothy Castiglia , Anirban Das , Stacy Patterson
‹ Prev 1 2 3 10 Next ›