Related papers: Constrained Deep Learning using Conditional Gradie…

Constraint Guided Gradient Descent: Guided Training with Inequality Constraints

Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that…

Artificial Intelligence · Computer Science 2022-06-15 Quinten Van Baelen , Peter Karsmakers

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Constraint-Based Regularization of Neural Networks

We propose a method for efficiently incorporating constraints into a stochastic gradient Langevin framework for the training of deep neural networks. Constraints allow direct control of the parameter space of the model. Appropriately…

Machine Learning · Computer Science 2021-06-22 Benedict Leimkuhler , Timothée Pouchon , Tiffany Vlaar , Amos Storkey

Better Training using Weight-Constrained Stochastic Dynamics

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of…

Machine Learning · Computer Science 2021-06-22 Benedict Leimkuhler , Tiffany Vlaar , Timothée Pouchon , Amos Storkey

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Improving Neural Network Training in Low Dimensional Random Bases

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly…

Machine Learning · Computer Science 2020-11-11 Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value $c >0$. It is widely used for example for stabilizing the training of deep learning…

Machine Learning · Computer Science 2023-11-10 Anastasia Koloskova , Hadrien Hendrikx , Sebastian U. Stich

Symmetry-guided gradient descent for quantum neural networks

Many supervised learning tasks have intrinsic symmetries, such as translational and rotational symmetry in image classifications. These symmetries can be exploited to enhance performance. We formulate the symmetry constraints into a concise…

Quantum Physics · Physics 2024-08-14 Kaiming Bian , Shitao Zhang , Fei Meng , Wen Zhang , Oscar Dahlsten

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to…

Machine Learning · Computer Science 2020-06-30 Giorgi Nadiradze , Ilia Markov , Bapi Chatterjee , Vyacheslav Kungurtsev , Dan Alistarh

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian…

Machine Learning · Statistics 2025-11-17 Yatin Dandi , Luca Pesce , Lenka Zdeborová , Florent Krzakala

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is…

Optimization and Control · Mathematics 2018-07-10 Lam M. Nguyen , Phuong Ha Nguyen , Marten van Dijk , Peter Richtárik , Katya Scheinberg , Martin Takáč

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for…

Machine Learning · Statistics 2015-12-25 Chunyuan Li , Changyou Chen , David Carlson , Lawrence Carin

Towards Understanding Gradient Approximation in Equality Constrained Deep Declarative Networks

We explore conditions for when the gradient of a deep declarative node can be approximated by ignoring constraint terms and still result in a descent direction for the global loss function. This has important practical application when…

Machine Learning · Computer Science 2023-06-27 Stephen Gould , Ming Xu , Zhiwei Xu , Yanbin Liu

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score…

Computer Vision and Pattern Recognition · Computer Science 2020-04-09 Hongwei Yong , Jianqiang Huang , Xiansheng Hua , Lei Zhang

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Implicit Gradient Alignment in Distributed and Federated Learning

A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data. In this work, we show…

Machine Learning · Computer Science 2021-12-14 Yatin Dandi , Luis Barba , Martin Jaggi

A Bootstrap Perspective on Stochastic Gradient Descent

Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the…

Machine Learning · Computer Science 2025-12-09 Hongjian Lan , Yucong Liu , Florian Schäfer