Related papers: Adaptive Proximal Gradient Method for Convex Optim…
In this paper, we propose a proximal gradient method and an accelerated proximal gradient method for solving composite optimization problems, where the objective function is the sum of a smooth and a convex, possibly nonsmooth, function. We…
Gradient methods are widely used in optimization problems. In practice, while the smoothness parameter can be estimated utilizing techniques such as backtracking, estimating the strong convexity parameter remains a challenge; moreover, even…
We propose an adaptive proximal gradient method for minimizing the sum of two functions, where one is a simple convex function, and the other belongs to one of the three classes: nonconvex smooth, convex nonsmooth, or convex smooth. The key…
First-order methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories…
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. No need for functional values, no line search, no…
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity…
We present two approximate versions of the proximal subgradient method for minimizing the sum of two convex functions (not necessarily differentiable). The algorithms involve, at each iteration, inexact evaluations of the proximal operator…
In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of…
We consider solving nonconvex composite optimization problems in which the sum of a smooth function and a nonsmooth function is minimized. Many of convergence analyses of proximal gradient-type methods rely on global descent property…
This work proposes A$^2$GD, a novel adaptive accelerated gradient descent method for convex and composite optimization. Smoothness and convexity constants are updated via Lyapunov analysis. Inspired by stability analysis in ODE solvers, the…
The proximal gradient algorithm has been popularly used for convex optimization. Recently, it has also been extended for nonconvex problems, and the current state-of-the-art is the nonmonotone accelerated proximal gradient algorithm.…
First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive…
Regularization is a widely recognized technique in mathematical optimization. It can be used to smooth out objective functions, refine the feasible solution set, or prevent overfitting in machine learning models. Due to its simplicity and…
Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…
In this paper, we propose a proximal stochasitc gradient algorithm (PSGA) for solving composite optimization problems by incorporating variance reduction techniques and an adaptive step-size strategy. In the PSGA method, the objective…
We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order…
In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages…
This paper optimizes the step coefficients of first-order methods for smooth convex minimization in terms of the worst-case convergence bound (i.e., efficiency) of the decrease in the gradient norm. This work is based on the performance…
In this paper we consider large-scale composite optimization problems having the objective function formed as a sum of two terms (possibly nonconvex), one has (block) coordinate-wise Lipschitz continuous gradient and the other is…
Backtracking linesearch is the de facto approach for minimizing continuously differentiable functions with locally Lipschitz gradient. In recent years, it has been shown that in the convex setting it is possible to avoid linesearch…