Related papers: Duality between subgradient and conditional gradie…
We provide new insight into a {\em generalized conditional subgradient} algorithm and a {\em generalized mirror descent} algorithm for the convex minimization problem \[ \min_x \; \{f(Ax) + h(x)\}.\] As Bach showed in [{\em SIAM J. Optim.},…
In this paper we propose distributed dual gradient algorithms for linearly constrained separable convex problems and analyze their rate of convergence under different assumptions. Under the strong convexity assumption on the primal…
We introduce and analyze a new family of first-order optimization algorithms which generalizes and unifies both mirror descent and dual averaging. Within the framework of this family, we define new algorithms for constrained optimization…
Using an optimization algorithm to solve a machine learning problem is one of mainstreams in the field of science. In this work, we demonstrate a comprehensive comparison of some state-of-the-art first-order optimization algorithms for…
In this paper we consider a class of optimization problems with a strongly convex objective function and the feasible set given by an intersection of a simple convex set with a set given by a number of linear equality and inequality…
We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the…
We propose primal-dual stochastic mirror descent for the convex optimization problems with functional constraints. We obtain the rate of convergence in terms of probability of large deviations.
As the problem of minimizing functionals on the Wasserstein space encompasses many applications in machine learning, different optimization algorithms on $\mathbb{R}^d$ have received their counterpart analog on the Wasserstein space. We…
We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an…
The mirror descent algorithm is known to be effective in situations where it is beneficial to adapt the mirror map to the underlying geometry of the optimization model. However, the effect of mirror maps on the geometry of distributed…
Bilevel programs are optimization problems where some variables are solutions to optimization problems themselves, and they arise in a variety of control applications, including: control of vehicle traffic networks, inverse reinforcement…
Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify…
We study alternating first-order algorithms with no inner loops for solving nonconvex-strongly-concave min-max problems. We show the convergence of the alternating gradient descent--ascent algorithm method by proposing a substantially…
The first part of this work established the foundations of a radial duality between nonnegative optimization problems, inspired by the work of (Renegar, 2016). Here we utilize our radial duality theory to design and analyze projection-free…
We show that the primal-dual gradient method, also known as the gradient descent ascent method, for solving convex-concave minimax problems can be viewed as an inexact gradient method applied to the primal problem. The gradient, whose exact…
First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known as implicit bias, plays a critical role in…
While first-order optimization methods are usually designed to efficiently reduce the function value $f(x)$, there has been recent interest in methods efficiently reducing the magnitude of $\nabla f(x)$, and the findings show that the two…
Dual first-order methods are powerful techniques for large-scale convex optimization. Although an extensive research effort has been devoted to studying their convergence properties, explicit convergence rates for the primal iterates have…
We propose a variant of the classical conditional gradient method for sparse inverse problems with differentiable measurement models. Such models arise in many practical problems including superresolution, time-series modeling, and matrix…
Online learning algorithms are fast, memory-efficient, easy to implement, and applicable to many prediction problems, including classification, regression, and ranking. Several online algorithms were proposed in the past few decades, some…