English
Related papers

Related papers: Asynchronous Heavy-Tailed Optimization

200 papers

This paper considers the problem of asynchronous stochastic nonconvex optimization with heavy-tailed gradient noise and arbitrarily heterogeneous computation times across workers. We propose an asynchronous normalized stochastic gradient…

Optimization and Control · Mathematics 2026-01-28 Yidong Wu , Luo Luo

In existing distributed stochastic optimization studies, it is usually assumed that the gradient noise has a bounded variance. However, recent research shows that the heavy-tailed noise, which allows an unbounded variance, is closer to…

Optimization and Control · Mathematics 2025-05-15 Jun Hu , Chao Sun , Bo Chen , Jianzheng Wang , Zheming Wang

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-21 Xinghao Pan , Jianmin Chen , Rajat Monga , Samy Bengio , Rafal Jozefowicz

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In…

Machine Learning · Computer Science 2017-03-22 Jianmin Chen , Xinghao Pan , Rajat Monga , Samy Bengio , Rafal Jozefowicz

This paper studies the distributed optimization problem under the influence of heavy-tailed gradient noises. Here, a heavy-tailed noise means that the noise does not necessarily satisfy the bounded variance assumption. Instead, it satisfies…

Optimization and Control · Mathematics 2025-05-12 Chao Sun , Huiming Zhang , Bo Chen , Li Yu

We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$, when the…

Optimization and Control · Mathematics 2024-04-18 Nikita Puchkin , Eduard Gorbunov , Nikolay Kutuzov , Alexander Gasnikov

We consider a standard distributed optimization problem in which networked nodes collaboratively minimize the sum of their locally known convex costs. For this setting, we address for the first time the fundamental problem of design and…

Optimization and Control · Mathematics 2025-06-02 Manojlo Vukovic , Dusan Jakovetic , Dragana Bajovic , Soummya Kar

Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine…

Machine Learning · Computer Science 2026-05-26 Xinwen Zhang , Yihan Zhang , Heng Liang , Hongchang Gao

Heavy-tailed noise in nonconvex stochastic optimization has garnered increasing research interest, as empirical studies, including those on training attention models, suggest it is a more realistic gradient noise condition. This paper…

Optimization and Control · Mathematics 2026-04-17 Shuhua Yu , Dusan Jakovetic , Soummya Kar

Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones.…

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous…

Machine Learning · Computer Science 2020-06-25 Mahmoud Assran , Arda Aytekin , Hamid Feyzmahdavian , Mikael Johansson , Michael Rabbat

We study the distributed stochastic optimization (DSO) problem under a heavy-tailed noise condition by utilizing a multi-agent system. Despite the extensive research on DSO algorithms used to solve DSO problems under light-tailed noise…

Optimization and Control · Mathematics 2025-09-23 Zhan Yu , Lan Liao , Deming Yuan , Daniel W. C. Ho , Ding-Xuan Zhou

Distributed optimization has become the default training paradigm in modern machine learning due to the growing scale of models and datasets. To mitigate communication overhead, local updates are often applied before global aggregation,…

Machine Learning · Computer Science 2025-08-15 Su Hyeong Lee , Manzil Zaheer , Tian Li

While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performance over AdamW in training large language models…

Machine Learning · Computer Science 2026-05-11 Dingzhi Yu , Hongyi Tao , Yuanyu Wan , Luo Luo , Lijun Zhang

The empirical evidence indicates that stochastic optimization with heavy-tailed gradient noise is more appropriate to characterize the training of machine learning models than that with standard bounded gradient variance noise. Most…

Machine Learning · Computer Science 2026-01-28 Hongxu Chen , Ke Wei , Xiaoming Yuan , Luo Luo

Synchronous federated learning scales poorly due to the straggler effect. Asynchronous algorithms increase the update throughput by processing updates upon arrival, but they introduce two fundamental challenges: gradient staleness, which…

Machine Learning · Computer Science 2026-03-30 Abdelkrim Alahyane , Céline Comte , Matthieu Jonckheere

Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization…

Machine Learning · Statistics 2020-06-12 Liam Hodgkinson , Michael W. Mahoney

In the era of large-scale neural network models, optimization algorithms often struggle with generalization due to an overreliance on training loss. One key insight widely accepted in the machine learning community is the idea that wide…

Machine Learning · Computer Science 2025-09-01 Bodu Gong , Gustavo Enrique Batista , Pierre Lafaye de Micheaux

We present an algorithm for distributed estimation of an unknown vector parameter $\boldsymbol{\theta}^\ast \in {\mathbb R}^M$ in the presence of heavy-tailed observation and communication noises. Heavy-tailed noises frequently appear,…

Information Theory · Computer Science 2026-03-24 Dragana Bajovic , Dusan Jakovetic , Soummya Kar , Manojlo Vukovic

Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have…

Machine Learning · Statistics 2025-05-06 Mert Gurbuzbalaban , Yuanhan Hu , Umut Simsekli , Kun Yuan , Lingjiong Zhu
‹ Prev 1 2 3 10 Next ›