Related papers: Adaptive Sampling Distributed Stochastic Variance …

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Data Dependent Convergence for Distributed Stochastic Optimization

In this dissertation we propose alternative analysis of distributed stochastic gradient descent (SGD) algorithms that rely on spectral properties of the data covariance. As a consequence we can relate questions pertaining to speedups and…

Optimization and Control · Mathematics 2016-09-03 Avleen S. Bijral

Online Learning to Sample

Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time…

Machine Learning · Computer Science 2016-03-16 Guillaume Bouchard , Théo Trouillon , Julien Perez , Adrien Gaidon

Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic…

Optimization and Control · Mathematics 2024-12-19 Andrey Sadchikov , Savelii Chezhegov , Aleksandr Beznosikov , Alexander Gasnikov

On the Convergence of Local Descent Methods in Federated Learning

In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non…

Machine Learning · Computer Science 2019-12-10 Farzin Haddadpour , Mehrdad Mahdavi

Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods

Variance-reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients and this work concerns the interactions between these two…

Optimization and Control · Mathematics 2022-10-19 Martin Morin , Pontus Giselsson

On Data Dependence in Distributed Stochastic Optimization

We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network…

Optimization and Control · Mathematics 2016-09-02 Avleen S. Bijral , Anand D. Sarwate , Nathan Srebro

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is…

Machine Learning · Statistics 2014-05-14 Peilin Zhao , Tong Zhang

L-SVRG and L-Katyusha with Adaptive Sampling

Stochastic gradient-based optimization methods, such as L-SVRG and its accelerated variant L-Katyusha (Kovalev et al., 2020), are widely used to train machine learning models.The theoretical and empirical performance of L-SVRG and…

Machine Learning · Computer Science 2023-06-07 Boxin Zhao , Boxiang Lyu , Mladen Kolar

Safe Adaptive Importance Sampling

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants - using importance values defined by the complete gradient information which changes…

Machine Learning · Computer Science 2017-11-08 Sebastian U. Stich , Anant Raj , Martin Jaggi

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent

This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local…

Optimization and Control · Mathematics 2024-09-19 Frederik Köhne , Leonie Kreis , Anton Schiela , Roland Herzog

Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning

We develop a distributed stochastic gradient descent algorithm for solving non-convex optimization problems under the assumption that the local objective functions are twice continuously differentiable with Lipschitz continuous gradients…

Optimization and Control · Mathematics 2019-08-20 Jemin George , Tao Yang , He Bai , Prudhvi Gurram

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation

Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise…

Machine Learning · Computer Science 2019-11-01 Beidi Chen , Yingchen Xu , Anshumali Shrivastava

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic…

Machine Learning · Computer Science 2016-12-21 Zhouyuan Huo , Heng Huang

Distributed Stochastic Optimization under a General Variance Condition

Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical…

Optimization and Control · Mathematics 2023-12-15 Kun Huang , Xiao Li , Shi Pu

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li