Related papers: Reducing Runtime by Recycling Samples

Stop Wasting My Gradients: Practical SVRG

We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors…

Machine Learning · Computer Science 2016-08-06 Reza Babanezhad , Mohamed Osama Ahmed , Alim Virani , Mark Schmidt , Jakub Konečný , Scott Sallinen

Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods

Variance-reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients and this work concerns the interactions between these two…

Optimization and Control · Mathematics 2022-10-19 Martin Morin , Pontus Giselsson

Stochastic Reweighted Gradient Descent

Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient…

Optimization and Control · Mathematics 2021-03-24 Ayoub El Hanchi , David A. Stephens

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Variance-Reduced Stochastic Learning under Random Reshuffling

Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results…

Machine Learning · Computer Science 2018-02-19 Bicheng Ying , Kun Yuan , Ali H. Sayed

Variance Reduced Stochastic Gradient Descent with Neighbors

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness,…

Machine Learning · Computer Science 2016-02-29 Thomas Hofmann , Aurelien Lucchi , Simon Lacoste-Julien , Brian McWilliams

Fast Variance Reduction Method with Stochastic Batch Size

In this paper we study a family of variance reduction methods with randomized batch size---at each step, the algorithm first randomly chooses the batch size and then selects a batch of samples to conduct a variance-reduced stochastic…

Machine Learning · Computer Science 2018-08-08 Xuanqing Liu , Cho-Jui Hsieh

Stochastic Optimization with Importance Sampling

Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform…

Machine Learning · Statistics 2015-01-05 Peilin Zhao , Tong Zhang

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Stochastic optimization algorithms are widely used for machine learning with large-scale data. However, their convergence often suffers from non-vanishing variance. Variance Reduction (VR) methods, such as SVRG and SARAH, address this issue…

Machine Learning · Computer Science 2026-01-12 Daniil Medyakov , Gleb Molodtsov , Savelii Chezhegov , Alexey Rebrikov , Aleksandr Beznosikov

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent…

Machine Learning · Computer Science 2017-12-05 Aixiang Chen , Bingchuan Chen , Xiaolong Chai , Rui Bian , Hengguang Li

An Analysis of Stochastic Variance Reduced Gradient for Linear Inverse Problems

Stochastic variance reduced gradient (SVRG) is a popular variance reduction technique for accelerating stochastic gradient descent (SGD). We provide a first analysis of the method for solving a class of linear inverse problems in the lens…

Numerical Analysis · Mathematics 2022-01-19 Bangti Jin , Zehui Zhou , Jun Zou

Carath\'eodory Sampling for Stochastic Gradient Descent

Many problems require to optimize empirical risk functions over large data sets. Gradient descent methods that calculate the full gradient in every descent step do not scale to such datasets. Various flavours of Stochastic Gradient Descent…

Machine Learning · Computer Science 2020-11-26 Francesco Cosentino , Harald Oberhauser , Alessandro Abate

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

Trading-off variance and complexity in stochastic gradient descent

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate,…

Machine Learning · Statistics 2016-03-23 Vatsal Shah , Megasthenis Asteris , Anastasios Kyrillidis , Sujay Sanghavi

Starting Small -- Learning with Adaptive Sample Sizes

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. In this context, we investigate strategies for dynamically increasing the effective sample size, when…

Machine Learning · Computer Science 2016-10-10 Hadi Daneshmand , Aurelien Lucchi , Thomas Hofmann

On the Batch Size Selection in Stochastic Gradient Methods Using No-Replacement Sampling

Recent stochastic gradient methods that have appeared in the literature base their efficiency and global convergence properties on a suitable control of the variance of the gradient batch estimate. This control is typically achieved by…

Optimization and Control · Mathematics 2025-06-11 Marco Boresta , Alberto De Santis , Stefano Lucidi

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

k-SVRG: Variance Reduction for Large Scale Optimization

Variance reduced stochastic gradient (SGD) methods converge significantly faster than the vanilla SGD counterpart. However, these methods are not very practical on large scale problems, as they either i) require frequent passes over the…

Optimization and Control · Mathematics 2018-10-17 Anant Raj , Sebastian U. Stich

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent…

Machine Learning · Statistics 2015-04-20 Mark Schmidt , Reza Babanezhad , Mohamed Osama Ahmed , Aaron Defazio , Ann Clifton , Anoop Sarkar

Cocoercivity, Smoothness and Bias in Variance-Reduced Stochastic Gradient Methods

With the purpose of examining biased updates in variance-reduced stochastic gradient methods, we introduce SVAG, a SAG/SAGA-like method with adjustable bias. SVAG is analyzed in a cocoercive root-finding setting, a setting which yields the…

Optimization and Control · Mathematics 2022-10-19 Martin Morin , Pontus Giselsson