Related papers: Online Learning to Sample

Stochastic Gradient Descent with Adaptive Data

Stochastic gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios. Its convergence analysis is relatively well understood under the assumption that the data samples are…

Machine Learning · Computer Science 2024-10-03 Ethan Che , Jing Dong , Xin T. Tong

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty…

Machine Learning · Computer Science 2024-11-26 Daniel Haimovich , Dima Karamshuk , Fridolin Linder , Niek Tax , Milan Vojnovic

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Stochastic Gradient Descent (SGD) is one of the most popular algorithms in statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in…

Machine Learning · Statistics 2025-04-08 Ziyang Wei , Wanrong Zhu , Wei Biao Wu

Online Covariance Matrix Estimation in Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm is widely used for parameter estimation, especially for huge data sets and online learning. While this recursive algorithm is popular for computation and memory efficiency, quantifying…

Machine Learning · Statistics 2021-06-23 Wanrong Zhu , Xi Chen , Wei Biao Wu

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is…

Machine Learning · Statistics 2014-05-14 Peilin Zhao , Tong Zhang

Adaptive Sketches for Robust Regression with Importance Sampling

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale…

Machine Learning · Computer Science 2022-07-19 Sepideh Mahabadi , David P. Woodruff , Samson Zhou

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional…

Machine Learning · Statistics 2017-05-10 Yuting Ma , Tian Zheng

A Variational Analysis of Stochastic Gradient Algorithms

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show…

Machine Learning · Statistics 2017-09-12 Stephan Mandt , Matthew D. Hoffman , David M. Blei

Stochastic Gradient Descent Meets Distribution Regression

Stochastic gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems. Here, we focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from…

Machine Learning · Statistics 2021-03-08 Nicole Mücke

Stochastic Multiple Target Sampling Gradient Descent

Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of…

Machine Learning · Computer Science 2023-02-13 Hoang Phan , Ngoc Tran , Trung Le , Toan Tran , Nhat Ho , Dinh Phung

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

SVGD: A Virtual Gradients Descent Method for Stochastic Optimization

Inspired by dynamic programming, we propose Stochastic Virtual Gradient Descent (SVGD) algorithm where the Virtual Gradient is defined by computational graph and automatic differentiation. The method is computationally efficient and has…

Machine Learning · Computer Science 2019-08-01 Zheng Li , Shi Shu

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization…

Machine Learning · Computer Science 2024-09-17 Haihan Zhang , Yuanshi Liu , Qianwen Chen , Cong Fang

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively…

Machine Learning · Statistics 2023-06-23 Gerard Ben Arous , Reza Gheissari , Aukosh Jagannath

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Safe Adaptive Importance Sampling

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications. Improved adaptive variants - using importance values defined by the complete gradient information which changes…

Machine Learning · Computer Science 2017-11-08 Sebastian U. Stich , Anant Raj , Martin Jaggi

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a…

Machine Learning · Computer Science 2025-05-29 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan