Related papers: A Relational Gradient Descent Algorithm For Suppor…

Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper…

Machine Learning · Computer Science 2011-11-04 Sangkyun Lee , Stephen J. Wright

Stochastic Subspace Descent

We present two stochastic descent algorithms that apply to unconstrained optimization and are particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained…

Optimization and Control · Mathematics 2019-04-30 David Kozak , Stephen Becker , Alireza Doostan , Luis Tenorio

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Efficient Gradient Approximation Method for Constrained Bilevel Optimization

Bilevel optimization has been developed for many machine learning tasks with large-scale and high-dimensional data. This paper considers a constrained bilevel optimization problem, where the lower-level optimization problem is convex with…

Machine Learning · Computer Science 2023-08-22 Siyuan Xu , Minghui Zhu

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Insensitive Stochastic Gradient Twin Support Vector Machine for Large Scale Problems

Stochastic gradient descent algorithm has been successfully applied on support vector machines (called PEGASOS) for many classification problems. In this paper, stochastic gradient descent algorithm is investigated to twin support vector…

Machine Learning · Computer Science 2018-08-17 Zhen Wang , Yuan-Hai Shao , Lan Bai , Li-Ming Liu , Nai-Yang Deng

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

Variance Suppression: Balanced Training Process in Deep Learning

Stochastic gradient descent updates parameters with summation gradient computed from a random data batch. This summation will lead to unbalanced training process if the data we obtained is unbalanced. To address this issue, this paper takes…

Machine Learning · Computer Science 2019-05-22 Tao Yi , Xingxuan Wang

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

The Implicit Bias of Gradient Descent on Separable Data

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The…

Machine Learning · Statistics 2024-10-29 Daniel Soudry , Elad Hoffer , Mor Shpigel Nacson , Suriya Gunasekar , Nathan Srebro

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is…

Machine Learning · Computer Science 2021-07-30 Zeyuan Allen-Zhu

SVGD: A Virtual Gradients Descent Method for Stochastic Optimization

Inspired by dynamic programming, we propose Stochastic Virtual Gradient Descent (SVGD) algorithm where the Virtual Gradient is defined by computational graph and automatic differentiation. The method is computationally efficient and has…

Machine Learning · Computer Science 2019-08-01 Zheng Li , Shi Shu

Efficient Dictionary Learning with Gradient Descent

Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of…

Optimization and Control · Mathematics 2018-09-28 Dar Gilboa , Sam Buchanan , John Wright

On the Convergence of SGD Training of Neural Networks

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Learning complexity of gradient descent and conjugate gradient algorithms

Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…

Optimization and Control · Mathematics 2024-12-19 Xianqi Jiao , Jia Liu , Zhiping Chen

Stein variational gradient descent with local approximations

Bayesian computation plays an important role in modern machine learning and statistics to reason about uncertainty. A key computational challenge in Bayesian inference is to develop efficient techniques to approximate, or draw samples from…

Numerical Analysis · Mathematics 2021-09-01 Liang Yan , Tao Zhou