English
Related papers

Related papers: Tight Time Complexities in Parallel Stochastic Opt…

200 papers

We consider the decentralized stochastic asynchronous optimization setup, where many workers asynchronously calculate stochastic gradients and asynchronously communicate with each other using edges in a multigraph. For both homogeneous and…

Optimization and Control · Mathematics 2024-11-05 Alexander Tyurin , Peter Richtárik

Asynchronous Stochastic Gradient Descent (Asynchronous SGD) is a cornerstone method for parallelizing learning in distributed machine learning. However, its performance suffers under arbitrarily heterogeneous computation times across…

Machine Learning · Computer Science 2025-06-04 Artavazd Maranjyan , Alexander Tyurin , Peter Richtárik

Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on…

Optimization and Control · Mathematics 2026-02-20 Artavazd Maranjyan , Peter Richtárik

Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization. We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-04 Grigory Begunov , Alexander Tyurin

Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy…

Optimization and Control · Mathematics 2026-01-07 Artavazd Maranjyan

Parallelization is a popular strategy for improving the performance of iterative algorithms. Optimization methods are no exception: design of efficient parallel optimization methods and tight analysis of their theoretical properties are…

Optimization and Control · Mathematics 2023-11-28 Alexander Tyurin , Peter Richtárik

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially…

Optimization and Control · Mathematics 2024-11-05 Alexander Tyurin , Marta Pozzi , Ivan Ilin , Peter Richtárik

In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the "graph oracle model", an extension of the classic oracle complexity framework that can be applied to study distributed…

Optimization and Control · Mathematics 2021-09-03 Blake Woodworth

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of…

Machine Learning · Computer Science 2021-03-11 Margalit Glasgow , Mary Wootters

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their…

Optimization and Control · Mathematics 2020-04-07 Atal Narayan Sahu , Aritra Dutta , Aashutosh Tiwari , Peter Richtárik

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient…

Machine Learning · Computer Science 2026-05-14 Ammar Mahran , Artavazd Maranjyan , Peter Richtárik

Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than…

Optimization and Control · Mathematics 2026-05-12 Zhirayr Tovmasyan , Artavazd Maranjyan , Peter Richtárik

We investigate the problem of minimizing the expectation of smooth nonconvex functions in a distributed setting with multiple parallel workers that are able to compute stochastic gradients. A significant challenge in this context is the…

Optimization and Control · Mathematics 2025-06-16 Artavazd Maranjyan , Omar Shaikh Omar , Peter Richtárik

We consider a realistic decentralized setup with bandwidth-constrained communication and derive optimal time complexities for non-convex stochastic parallel and asynchronous optimization (up to logarithmic factors). We develop the…

Optimization and Control · Mathematics 2026-03-24 Alexander Tyurin

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous…

Machine Learning · Computer Science 2020-06-25 Mahmoud Assran , Arda Aytekin , Hamid Feyzmahdavian , Mikael Johansson , Michael Rabbat

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis…

Optimization and Control · Mathematics 2020-03-12 Antonio Orvieto , Aurelien Lucchi

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

For SGD based distributed stochastic optimization, computation complexity, measured by the convergence rate in terms of the number of stochastic gradient calls, and communication complexity, measured by the number of inter-node…

Optimization and Control · Mathematics 2019-05-14 Hao Yu , Rong Jin
‹ Prev 1 2 3 10 Next ›