English
Related papers

Related papers: Permutation-Based SGD: Is Random Optimal?

200 papers

We analyze the convergence rates of two popular variants of coordinate descent (CD): random CD (RCD), in which the coordinates are sampled uniformly at random, and random-permutation CD (RPCD), in which random permutations are used to…

Optimization and Control · Mathematics 2025-05-30 Donghwa Kim , Jaewook Lee , Chulhee Yun

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with…

Machine Learning · Computer Science 2021-06-03 Itay Safran , Ohad Shamir

While gradient-based optimizers that incorporate randomization often showcase superior performance on complex optimization, the theoretical foundations underlying this superiority remain insufficiently understood. A particularly pressing…

Machine Learning · Computer Science 2025-05-20 Wei Zhang , Arif Hassan Zidan , Afrar Jahin , Yu Bao , Tianming Liu

While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge…

Machine Learning · Computer Science 2022-02-07 Amirkeivan Mohtashami , Sebastian Stich , Martin Jaggi

We analyze the convergence rate of the random reshuffling (RR) method, which is a randomized first-order incremental algorithm for minimizing a finite sum of convex component functions. RR proceeds in cycles, picking a uniformly random…

Optimization and Control · Mathematics 2022-02-09 Mert Gürbüzbalaban , Asuman Ozdaglar , Pablo Parrilo

A long-standing problem in the theory of stochastic gradient descent (SGD) is to prove that its without-replacement version RandomShuffle converges faster than the usual with-replacement version. We present the first (to our knowledge)…

Optimization and Control · Mathematics 2019-10-09 Jeff Z. HaoChen , Suvrit Sra

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is…

Optimization and Control · Mathematics 2021-04-06 Konstantin Mishchenko , Ahmed Khaled , Peter Richtárik

We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms…

Machine Learning · Computer Science 2023-06-12 Jaeyoung Cha , Jaewook Lee , Chulhee Yun

Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each…

Optimization and Control · Mathematics 2024-02-08 Xufeng Cai , Cheuk Yin Lin , Jelena Diakonikolas

Recently, there has been much interest in studying the convergence rates of without-replacement SGD, and proving that it is faster than with-replacement SGD in the worst case. However, known lower bounds ignore the problem's geometry,…

Machine Learning · Computer Science 2021-12-07 Itay Safran , Ohad Shamir

We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order…

Optimization and Control · Mathematics 2025-05-30 Zijian Liu , Zhengyuan Zhou

Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical…

Machine Learning · Statistics 2022-06-08 Yi-An Ma , Yuansi Chen , Chi Jin , Nicolas Flammarion , Michael I. Jordan

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works…

Machine Learning · Computer Science 2019-01-30 Bicheng Ying , Kun Yuan , Stefan Vlaski , Ali H. Sayed

We consider coordinate descent (CD) methods with exact line search on convex quadratic problems. Our main focus is to study the performance of the CD method that use random permutations in each epoch and compare it to the performance of the…

Optimization and Control · Mathematics 2018-03-23 Mert Gurbuzbalaban , Asuman Ozdaglar , Nuri Denizcan Vanli , Stephen J. Wright

Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex…

Optimization and Control · Mathematics 2020-07-27 Ahmed Khaled , Peter Richtárik

Recent theoretical results demonstrate that the convergence rates of permutation-based SGD (e.g., random reshuffling SGD) are faster than uniform-sampling SGD; however, these studies focus mainly on the large epoch regime, where the number…

Machine Learning · Computer Science 2025-06-05 Yujun Kim , Jaeyoung Cha , Chulhee Yun

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient…

Machine Learning · Computer Science 2023-04-04 Chen Fan , Christos Thrampoulidis , Mark Schmidt

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which…

Machine Learning · Computer Science 2023-10-27 Lam M. Nguyen , Trang H. Tran
‹ Prev 1 2 3 10 Next ›