Related papers: Nesterov-aided Stochastic Gradient Methods using L…
Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue…
In this paper we propose an efficient stochastic optimization algorithm to search for Bayesian experimental designs such that the expected information gain is maximized. The gradient of the expected information gain with respect to…
We present a coupled system of ODEs which, when discretized with a constant time step/learning rate, recovers Nesterov's accelerated gradient descent algorithm. The same ODEs, when discretized with a decreasing learning rate, leads to novel…
Projected gradient descent and its Riemannian variant belong to a typical class of methods for low-rank matrix estimation. This paper proposes a new Nesterov's Accelerated Riemannian Gradient algorithm by efficient orthographic retraction…
An optimal experimental set-up maximizes the value of data for statistical inferences and predictions. The efficiency of strategies for finding optimal experimental set-ups is particularly important for experiments that are time-consuming…
We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal…
Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level…
In this paper, we propose a new stochastic optimization algorithm for Bayesian inference based on multilevel Monte Carlo (MLMC) methods. In Bayesian statistics, biased estimators of the model evidence have been often used as stochastic…
We present a variant of accelerated gradient descent algorithms, adapted from Nesterov's optimal first-order methods, for weakly-quasi-convex and weakly-quasi-strongly-convex functions. We show that by tweaking the so-called estimate…
A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…
In this paper, we present a multilevel Monte Carlo (MLMC) version of the Stochastic Gradient (SG) method for optimization under uncertainty, in order to tackle Optimal Control Problems (OCP) where the constraints are described in the form…
We develop a generalization of Nesterov's accelerated gradient descent method which is designed to deal with orthogonality constraints. To demonstrate the effectiveness of our method, we perform numerical experiments which demonstrate that…
In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. More precisely, we interpret a large class of…
Stochastic optimization in learning and inference often relies on Markov chain Monte Carlo (MCMC) to approximate gradients when exact computation is intractable. However, finite-time MCMC estimators are biased, and reducing this bias…
We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence…
We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics,…
Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…
There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error…
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex…
Despite their frequent slow convergence, proximal gradient schemes are widely used in large-scale optimization tasks due to their tremendous stability, scalability, and ease of computation. In this paper, we develop and investigate a…