Related papers: Nesterov-aided Stochastic Gradient Methods using L…

Gradient-based stochastic optimization methods in Bayesian experimental design

Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue…

Computation · Statistics 2014-12-30 Xun Huan , Youssef M. Marzouk

Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs

In this paper we propose an efficient stochastic optimization algorithm to search for Bayesian experimental designs such that the expected information gain is maximized. The gradient of the expected information gain with respect to…

Computation · Statistics 2022-02-03 Takashi Goda , Tomohiko Hironaka , Wataru Kitade , Adam Foster

Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent

We present a coupled system of ODEs which, when discretized with a constant time step/learning rate, recovers Nesterov's accelerated gradient descent algorithm. The same ODEs, when discretized with a decreasing learning rate, leads to novel…

Optimization and Control · Mathematics 2020-09-02 Maxime Laborde , Adam M. Oberman

Fast gradient method for Low-Rank Matrix Estimation

Projected gradient descent and its Riemannian variant belong to a typical class of methods for low-rank matrix estimation. This paper proposes a new Nesterov's Accelerated Riemannian Gradient algorithm by efficient orthographic retraction…

Optimization and Control · Mathematics 2023-06-05 Hongyi Li , Zhen Peng , Chengwei Pan , Di Zhao

Multilevel Double Loop Monte Carlo and Stochastic Collocation Methods with Importance Sampling for Bayesian Optimal Experimental Design

An optimal experimental set-up maximizes the value of data for statistical inferences and predictions. The efficiency of strategies for finding optimal experimental set-ups is particularly important for experiments that are time-consuming…

Numerical Analysis · Mathematics 2020-02-04 Joakim Beck , Ben Mansour Dia , Luis F. R. Espath , Raul Tempone

A Universally Optimal Multistage Accelerated Stochastic Gradient Method

We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal…

Optimization and Control · Mathematics 2019-10-29 Necdet Serhat Aybat , Alireza Fallah , Mert Gurbuzbalaban , Asuman Ozdaglar

Multi-Level Stochastic Gradient Methods for Nested Composition Optimization

Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level…

Optimization and Control · Mathematics 2018-01-15 Shuoguang Yang , Mengdi Wang , Ethan X. Fang

Efficient Debiased Evidence Estimation by Multilevel Monte Carlo Sampling

In this paper, we propose a new stochastic optimization algorithm for Bayesian inference based on multilevel Monte Carlo (MLMC) methods. In Bayesian statistics, biased estimators of the model evidence have been often used as stochastic…

Machine Learning · Statistics 2021-02-26 Kei Ishikawa , Takashi Goda

A Note on Nesterov's Accelerated Method in Nonconvex Optimization: a Weak Estimate Sequence Approach

We present a variant of accelerated gradient descent algorithms, adapted from Nesterov's optimal first-order methods, for weakly-quasi-convex and weakly-quasi-strongly-convex functions. We show that by tweaking the so-called estimate…

Optimization and Control · Mathematics 2020-06-16 Jingjing Bu , Mehran Mesbahi

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…

Machine Learning · Computer Science 2019-05-15 Jia Bi , Steve R. Gunn

A Multilevel Stochastic Gradient method for PDE-constrained Optimal Control Problems with uncertain parameters

In this paper, we present a multilevel Monte Carlo (MLMC) version of the Stochastic Gradient (SG) method for optimization under uncertainty, in order to tackle Optimal Control Problems (OCP) where the constraints are described in the form…

Optimization and Control · Mathematics 2019-12-30 Matthieu Martin , Fabio Nobile , Panagiotis Tsilifis

Accelerated Optimization With Orthogonality Constraints

We develop a generalization of Nesterov's accelerated gradient descent method which is designed to deal with orthogonality constraints. To demonstrate the effectiveness of our method, we perform numerical experiments which demonstrate that…

Optimization and Control · Mathematics 2021-01-07 Jonathan W. Siegel

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. More precisely, we interpret a large class of…

Machine Learning · Statistics 2020-09-07 Andrei Kulunchakov , Julien Mairal

Convergence of Multi-Level Markov Chain Monte Carlo Adaptive Stochastic Gradient Algorithms

Stochastic optimization in learning and inference often relies on Markov chain Monte Carlo (MCMC) to approximate gradients when exact computation is intractable. However, finite-time MCMC estimators are biased, and reducing this bias…

Statistics Theory · Mathematics 2026-02-02 Antoine Godichon-Baggioni , Gabriel Lang , Sylvain Le Corff , Julien Stoehr , Sobihan Surendran

A Variational Perspective on High-Resolution ODEs

We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence…

Optimization and Control · Mathematics 2023-11-06 Hoomaan Maskan , Konstantinos C. Zygalakis , Alp Yurtsever

Optimal sampling for stochastic and natural gradient descent

We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics,…

Optimization and Control · Mathematics 2026-02-02 Robert Gruhlke , Anthony Nouy , Philipp Trunschke

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan

Accelerating Stochastic Gradient Descent For Least Squares Regression

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error…

Machine Learning · Statistics 2018-08-02 Prateek Jain , Sham M. Kakade , Rahul Kidambi , Praneeth Netrapalli , Aaron Sidford

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex…

Machine Learning · Computer Science 2020-10-26 Luo Luo , Haishan Ye , Zhichao Huang , Tong Zhang

Accelerating Proximal Gradient-type Algorithms using Damped Anderson Acceleration with Restarts and Nesterov Initialization

Despite their frequent slow convergence, proximal gradient schemes are widely used in large-scale optimization tasks due to their tremendous stability, scalability, and ease of computation. In this paper, we develop and investigate a…

Computation · Statistics 2025-08-19 Nicholas C. Henderson , Ravi Varadhan