English
Related papers

Related papers: Amortized Analysis on Asynchronous Gradient Descen…

200 papers

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…

Machine Learning · Computer Science 2021-06-16 Tian Tong , Cong Ma , Yuejie Chi

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…

Machine Learning · Statistics 2018-05-09 Li He , Qi Meng , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning.…

Machine Learning · Statistics 2022-07-21 Adarsh M. Subramaniam , Akshayaa Magesh , Venugopal V. Veeravalli

Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of…

Optimization and Control · Mathematics 2026-03-10 Yan Wu , Yipeng Zhang , Lu Liu , Yuyuan Ouyang

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is,…

Machine Learning · Computer Science 2020-02-19 Shuxin Zheng , Qi Meng , Taifeng Wang , Wei Chen , Nenghai Yu , Zhi-Ming Ma , Tie-Yan Liu

While Nesterov's Accelerated Gradient Descent (AGD) efficiently solves constrained problems when the constraint set $X \subseteq \mathbb{R}^n$ is simple and easy to project onto, it remains an open question whether function-constrained…

Optimization and Control · Mathematics 2025-12-02 Zhe Zhang , Guanghui Lan

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically,…

Optimization and Control · Mathematics 2016-12-30 Yun Kuen Cheung , Richard Cole

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates…

Machine Learning · Computer Science 2026-05-18 Manuel Graca , L. Miguel Silveira , Arlindo Oliveira , Frank Liu

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees…

Optimization and Control · Mathematics 2023-04-21 Konstantin Mishchenko , Francis Bach , Mathieu Even , Blake Woodworth

This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function $Q$, as a core part of optimization and learning algorithms. Our algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-14 Hagit Attiya , Noa Schiller

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-11 Karl Bäckström , Marina Papatriantafilou , Philippas Tsigas

We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the…

Optimization and Control · Mathematics 2021-10-04 Hailiang Liu , Xuping Tian

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two…

Machine Learning · Computer Science 2021-06-17 Sebastian U. Stich , Sai Praneeth Karimireddy

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell
‹ Prev 1 2 3 10 Next ›