Related papers: Amortized Analysis on Asynchronous Gradient Descen…
Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…
Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…
Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning.…
Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of…
Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…
Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…
The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…
With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is,…
While Nesterov's Accelerated Gradient Descent (AGD) efficiently solves constrained problems when the constraint set $X \subseteq \mathbb{R}^n$ is simple and easy to project onto, it remains an open question whether function-constrained…
The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…
This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically,…
In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates…
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees…
This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function $Q$, as a core part of optimization and learning algorithms. Our algorithms…
Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…
We propose AEGD, a new algorithm for first-order gradient-based optimization of non-convex objective functions, based on a dynamically updated energy variable. The method is shown to be unconditionally energy stable, irrespective of the…
Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two…
The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…