Related papers: Permutation-Based SGD: Is Random Optimal?

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

We analyze the convergence rates of two popular variants of coordinate descent (CD): random CD (RCD), in which the coordinates are sampled uniformly at random, and random-permutation CD (RPCD), in which random permutations are used to…

Optimization and Control · Mathematics 2025-05-30 Donghwa Kim , Jaewook Lee , Chulhee Yun

How Good is SGD with Random Shuffling?

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with…

Machine Learning · Computer Science 2021-06-03 Itay Safran , Ohad Shamir

Permutation Randomization on Nonsmooth Nonconvex Optimization: A Theoretical and Experimental Study

While gradient-based optimizers that incorporate randomization often showcase superior performance on complex optimization, the theoretical foundations underlying this superiority remain insufficiently understood. A particularly pressing…

Machine Learning · Computer Science 2025-05-20 Wei Zhang , Arif Hassan Zidan , Afrar Jahin , Yu Bao , Tianming Liu

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge…

Machine Learning · Computer Science 2022-02-07 Amirkeivan Mohtashami , Sebastian Stich , Martin Jaggi

Why Random Reshuffling Beats Stochastic Gradient Descent

We analyze the convergence rate of the random reshuffling (RR) method, which is a randomized first-order incremental algorithm for minimizing a finite sum of convex component functions. RR proceeds in cycles, picking a uniformly random…

Optimization and Control · Mathematics 2022-02-09 Mert Gürbüzbalaban , Asuman Ozdaglar , Pablo Parrilo

Random Shuffling Beats SGD after Finite Epochs

A long-standing problem in the theory of stochastic gradient descent (SGD) is to prove that its without-replacement version RandomShuffle converges faster than the usual with-replacement version. We present the first (to our knowledge)…

Optimization and Control · Mathematics 2019-10-09 Jeff Z. HaoChen , Suvrit Sra

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

Random Reshuffling: Simple Analysis with Vast Improvements

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is…

Optimization and Control · Mathematics 2021-04-06 Konstantin Mishchenko , Ahmed Khaled , Peter Richtárik

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms…

Machine Learning · Computer Science 2023-06-12 Jaeyoung Cha , Jaewook Lee , Chulhee Yun

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each…

Optimization and Control · Mathematics 2024-02-08 Xufeng Cai , Cheuk Yin Lin , Jelena Diakonikolas

Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

Recently, there has been much interest in studying the convergence rates of without-replacement SGD, and proving that it is faster than with-replacement SGD in the worst case. However, known lower bounds ignore the problem's geometry,…

Machine Learning · Computer Science 2021-12-07 Itay Safran , Ohad Shamir

Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization

We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order…

Optimization and Control · Mathematics 2025-05-30 Zijian Liu , Zhengyuan Zhou

Sampling Can Be Faster Than Optimization

Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical…

Machine Learning · Statistics 2022-06-08 Yi-An Ma , Yuansi Chen , Chi Jin , Nicolas Flammarion , Michael I. Jordan

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Stochastic Learning under Random Reshuffling with Constant Step-sizes

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works…

Machine Learning · Computer Science 2019-01-30 Bicheng Ying , Kun Yuan , Stefan Vlaski , Ali H. Sayed

Randomness and Permutations in Coordinate Descent Methods

We consider coordinate descent (CD) methods with exact line search on convex quadratic problems. Our main focus is to study the performance of the CD method that use random permutations in each epoch and compare it to the performance of the…

Optimization and Control · Mathematics 2018-03-23 Mert Gurbuzbalaban , Asuman Ozdaglar , Nuri Denizcan Vanli , Stephen J. Wright

Better Theory for SGD in the Nonconvex World

Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex…

Optimization and Control · Mathematics 2020-07-27 Ahmed Khaled , Peter Richtárik

Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems

Recent theoretical results demonstrate that the convergence rates of permutation-based SGD (e.g., random reshuffling SGD) are faster than uniform-sampling SGD; however, these studies focus mainly on the large epoch regime, where the number…

Machine Learning · Computer Science 2025-06-05 Yujun Kim , Jaeyoung Cha , Chulhee Yun

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-\L ojasiewicz Condition

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient…

Machine Learning · Computer Science 2023-04-04 Chen Fan , Christos Thrampoulidis , Mark Schmidt

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which…

Machine Learning · Computer Science 2023-10-27 Lam M. Nguyen , Trang H. Tran