Related papers: Last-iterate convergence rates for min-max optimiz…
The stochastic gradient descent has been widely used for solving composite optimization problems in big data analyses. Many algorithms and convergence properties have been developed. The composite functions were convex primarily and…
Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact…
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical…
In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…
Equilibrium computation on Riemannian manifolds provides a unifying framework for numerous problems in machine learning and data analytics. One of the simplest yet most fundamental methods is Riemannian gradient descent (RGD). While its…
In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly…
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of…
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD)…
Many large-scale constrained optimization problems can be formulated as bilevel distributed optimization tasks over undirected networks, where agents collaborate to minimize a global cost function while adhering to constraints, relying only…
We prove novel convergence results for a stochastic proximal gradient algorithm suitable for solving a large class of convex optimization problems, where a convex objective function is given by the sum of a smooth and a possibly non-smooth…
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex or, respectively, smooth and convex functions. We establish sufficient conditions, via new discrete Lyapunov…
We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is…
In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study…
The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice…
We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms…
The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD)…
Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…
The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective…
A variant of consensus based distributed gradient descent (\textbf{DGD}) is studied for finite sums of smooth but possibly non-convex functions. In particular, the local gradient term in the fixed step-size iteration of each agent is…
We study a hybrid conditional gradient - smoothing algorithm (HCGS) for solving composite convex optimization problems which contain several terms over a bounded set. Examples of these include regularization problems with several norms as…