Related papers: Last-iterate convergence rates for min-max optimiz…
In this paper, we show that simple {Stochastic} subGradient Decent methods with multiple Restarting, named {\bf RSGD}, can achieve a \textit{linear convergence rate} for a class of non-smooth and non-strongly convex optimization problems…
Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining…
We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the…
Recently, minimax optimization received renewed focus due to modern applications in machine learning, robust optimization, and reinforcement learning. The scale of these applications naturally leads to the use of first-order methods.…
We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization…
Incremental gradient and incremental proximal methods are a fundamental class of optimization algorithms used for solving finite sum problems, broadly studied in the literature. Yet, without strong convexity, their convergence guarantees…
In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non…
We study first-order methods for constrained min-max optimization. Existing methods either require two gradient calls or two projections in each iteration, which may be costly in some applications. In this paper, we first show that a…
Mirror descent (MD) is a powerful first-order optimization technique that subsumes several optimization algorithms including gradient descent (GD). In this work, we develop a semi-definite programming (SDP) framework to analyze the…
Communication has been seen as a significant bottleneck in industrial applications over large-scale networks. To alleviate the communication burden, sign-based optimization algorithms have gained popularity recently in both industrial and…
This paper is devoted to the class of paraconvex functions and presents some of its fundamental properties, characterization, and examples that can be used for their recognition and optimization. Next, the convergence analysis of the…
Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based…
We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger…
First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known as implicit bias, plays a critical role in…
Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the…
We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a…
Gradient descent methods are fundamental first-order optimization algorithms in both Euclidean spaces and Riemannian manifolds. However, the exact gradient is not readily available in many scenarios. This paper proposes a novel inexact…
We study policy optimization for infinite-horizon, discounted constrained Markov decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a policy is computationally and memory…
Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a…
When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is non-convex and non-smooth. While many works have been focusing on…