Related papers: Nesterov-aided Stochastic Gradient Methods using L…
This paper explores numerical methods for solving a convex differentiable semi-infinite program. We introduce a primal-dual gradient method which performs three updates iteratively: a momentum gradient ascend step to update the constraint…
In this work, we investigate a stochastic gradient descent method for solving inverse problems that can be written as systems of linear or nonlinear ill-posed equations in Banach spaces. The method uses only a randomly selected equation at…
Convergence analysis of accelerated first-order methods for convex optimization problems are presented from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient flow, has…
In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent…
We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings. We present a normalized proximal…
In this paper, we proposed a new technique, {\em variance controlled stochastic gradient} (VCSG), to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by…
This paper investigates distributed zeroth-order optimization for smooth nonconvex problems, targeting the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation in current algorithms that use either the…
In this paper, we describe a new way to get convergence rates for optimal methods in smooth (strongly) convex optimization tasks. Our approach is based on results for tasks where gradients have nonrandom small noises. Unlike previous…
Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprising two components: a convex loss and a penalty function. While AG methods perform well for convex penalties, such as the LASSO, convergence…
With the success that the field of bilevel optimization has seen in recent years, similar methodologies have started being applied to solving more difficult applications that arise in trilevel optimization. At the helm of these applications…
Bayesian Neural Networks (BNNs) provide a promising framework for modeling predictive uncertainty and enhancing out-of-distribution robustness (OOD) by estimating the posterior distribution of network parameters. Stochastic Gradient Markov…
Bayesian inference tasks continue to pose a computational challenge. This especially holds for spatial-temporal modeling where high-dimensional latent parameter spaces are ubiquitous. The methodology of integrated nested Laplace…
Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…
We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank-$r$ matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, we prove that…
In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient…
Nesterov's well-known scheme for accelerating gradient descent in convex optimization problems is adapted to accelerating stationary iterative solvers for linear systems. Compared with classical Krylov subspace acceleration methods, the…
In this paper, we present a stochastic augmented Lagrangian approach on (possibly infinite-dimensional) Riemannian manifolds to solve stochastic optimization problems with a finite number of deterministic constraints.We investigate the…
Established methods for unsupervised representation learning such as variational autoencoders produce none or poorly calibrated uncertainty estimates making it difficult to evaluate if learned representations are stable and reliable. In…
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…
Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method…