Related papers: Better Mini-Batch Algorithms via Accelerated Gradi…
In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches is becoming a golden standard in the…
We focus on solving constrained convex optimization problems using mini-batch stochastic gradient descent. Dynamic sample size rules are presented which ensure a descent direction with high probability. Empirical results from two…
We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only…
We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an…
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem…
A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…
We propose an accelerated meta-algorithm, which allows to obtain accelerated methods for convex unconstrained minimization in different settings. As an application of the general scheme we propose nearly optimal methods for minimizing…
We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric…
We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large…
We develop multi-step gradient methods for network-constrained optimization of strongly convex functions with Lipschitz-continuous gradients. Given the topology of the underlying network and bounds on the Hessian of the objective function,…
In this paper we consider non-smooth convex optimization problems with (possibly) infinite intersection of constraints. In contrast to the classical approach, where the constraints are usually represented as intersection of simple sets,…
We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function…
We extend the Approximate-Proximal Point (aProx) family of model-based methods for solving stochastic convex optimization problems, including stochastic subgradient, proximal point, and bundle methods, to the minibatch and accelerated…
A set of accelerated first order algorithms with memory are proposed for minimising strongly convex functions. The algorithms are differentiated by their use of the iterate history for the gradient step. The increased convergence rate of…
(Mini-batch) Stochastic Gradient Descent is a popular optimization method which has been applied to many machine learning applications. But a rather high variance introduced by the stochastic gradient in each step may slow down the…
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot…
We consider the problem of asynchronous stochastic optimization, where an optimization algorithm makes updates based on stale stochastic gradients of the objective that are subject to an arbitrary (possibly adversarial) sequence of delays.…
In this paper, acceleration of gradient methods for convex optimization problems with weak levels of convexity and smoothness is considered. Starting from the universal fast gradient method which was designed to be an optimal method for…
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we…
We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity…