Related papers: Statistically Preconditioned Accelerated Gradient …

Iterative Pre-Conditioning to Expedite the Gradient-Descent Method

This paper considers the problem of multi-agent distributed optimization. In this problem, there are multiple agents in the system, and each agent only knows its local cost function. The objective for the agents is to collectively compute a…

Optimization and Control · Mathematics 2020-03-31 Kushal Chakrabarti , Nirupam Gupta , Nikhil Chopra

On Accelerating Distributed Convex Optimizations

This paper studies a distributed multi-agent convex optimization problem. The system comprises multiple agents in this problem, each with a set of local data points and an associated local cost function. The agents are connected to a…

Optimization and Control · Mathematics 2021-08-20 Kushal Chakrabarti , Nirupam Gupta , Nikhil Chopra

Parametrized Accelerated Methods Free of Condition Number

Analyses of accelerated (momentum-based) gradient descent usually assume bounded condition number to obtain exponential convergence rates. However, in many real problems, e.g., kernel methods or deep neural networks, the condition number,…

Machine Learning · Computer Science 2018-03-06 Chaoyue Liu , Mikhail Belkin

Acceleration in Distributed Optimization under Similarity

We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order…

Optimization and Control · Mathematics 2022-04-12 Ye Tian , Gesualdo Scutari , Tianyu Cao , Alexander Gasnikov

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under…

Machine Learning · Computer Science 2024-11-07 Yoni Choukroun , Shlomi Azoulay , Pavel Kisilev

Temporal Predictive Coding for Gradient Compression in Distributed Learning

This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…

Information Theory · Computer Science 2024-10-04 Adrian Edin , Zheng Chen , Michel Kieffer , Mikael Johansson

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets

We study distributed optimization algorithms for minimizing the average of \emph{heterogeneous} functions distributed across several machines with a focus on communication efficiency. In such settings, naively using the classical stochastic…

Machine Learning · Computer Science 2020-11-18 Ilqar Ramazanli , Han Nguyen , Hai Pham , Sashank J. Reddi , Barnabas Poczos

Hyperfast Second-Order Local Solvers for Efficient Statistically Preconditioned Distributed Optimization

Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the…

Optimization and Control · Mathematics 2022-10-05 Pavel Dvurechensky , Dmitry Kamzolov , Aleksandr Lukashevich , Soomin Lee , Erik Ordentlich , César A. Uribe , Alexander Gasnikov

Locally Accelerated Conditional Gradients

Conditional gradients constitute a class of projection-free first-order algorithms for smooth convex optimization. As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational…

Optimization and Control · Mathematics 2019-10-14 Jelena Diakonikolas , Alejandro Carderera , Sebastian Pokutta

Accelerated projected gradient algorithms for sparsity constrained optimization problems

We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity…

Optimization and Control · Mathematics 2026-02-13 Jan Harold Alcantara , Ching-pei Lee

Accelerated Gradient Methods for Networked Optimization

We develop multi-step gradient methods for network-constrained optimization of strongly convex functions with Lipschitz-continuous gradients. Given the topology of the underlying network and bounds on the Hessian of the objective function,…

Optimization and Control · Mathematics 2015-06-12 Euhanna Ghadimi , Iman Shames , Mikael Johansson

Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning

In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new…

Machine Learning · Computer Science 2018-02-12 Di Wang , Jinhui Xu

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

This paper considers the multi-agent linear least-squares problem in a server-agent network. In this problem, the system comprises multiple agents, each having a set of local data points, that are connected to a server. The goal for the…

Optimization and Control · Mathematics 2024-10-29 Kushal Chakrabarti , Nirupam Gupta , Nikhil Chopra

Communication-Efficient Distributed Optimization with Quantized Preconditioners

We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of…

Optimization and Control · Mathematics 2021-06-21 Foivos Alimisis , Peter Davies , Dan Alistarh

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

On the convergence rate of distributed gradient methods for finite-sum optimization under communication delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems,…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Carolyn L. Beck , R. Srikant

Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions

Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be…

Optimization and Control · Mathematics 2023-08-29 Dmitrii A. Pasechnyuk , Alexander Gasnikov , Martin Takáč

Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning

Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision. In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. Our…

Optimization and Control · Mathematics 2020-02-28 Zhenzhang Ye , Thomas Möllenhoff , Tao Wu , Daniel Cremers

Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning

Modern realities and trends in learning require more and more generalization ability of models, which leads to an increase in both models and training sample size. It is already difficult to solve such tasks in a single device mode. This is…

Optimization and Control · Mathematics 2024-12-03 Dmitry Bylinkin , Kirill Degtyarev , Aleksandr Beznosikov