Related papers: Learning Sub-Patterns in Piecewise Continuous Func…

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

The duality structure gradient descent algorithm: analysis and applications to neural networks

The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness…

Machine Learning · Computer Science 2024-06-18 Thomas Flynn

A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks

In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural…

Machine Learning · Computer Science 2019-05-29 Joseph Daws , Clayton G. Webster

Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive…

Machine Learning · Computer Science 2024-10-11 Semih Cayci , Atilla Eryilmaz

Splitting Steepest Descent for Growing Neural Architectures

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple…

Machine Learning · Computer Science 2019-11-06 Qiang Liu , Lemeng Wu , Dilin Wang

Step by Step Network

Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Dongchen Han , Tianzhu Ye , Zhuofan Xia , Kaiyi Chen , Yulin Wang , Hanting Chen , Gao Huang

Decoupling Search and Learning in Neural Net Training

Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is…

Machine Learning · Computer Science 2025-09-16 Akshay Vegesna , Samip Dahal

Training Differentially Private Graph Neural Networks with Random Walk Sampling

Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for…

Machine Learning · Computer Science 2023-01-03 Morgane Ayle , Jan Schuchardt , Lukas Gosch , Daniel Zügner , Stephan Günnemann

Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing…

Machine Learning · Computer Science 2026-05-20 Ha Dang , Sebastian Schmidt , Juergen Hesser

A Two-Stage Subspace Trust Region Approach for Deep Neural Network Training

In this paper, we develop a novel second-order method for training feed-forward neural nets. At each iteration, we construct a quadratic approximation to the cost function in a low-dimensional subspace. We minimize this approximation inside…

Computer Vision and Pattern Recognition · Computer Science 2018-05-25 Viacheslav Dudar , Giovanni Chierchia , Emilie Chouzenoux , Jean-Christophe Pesquet , Vladimir Semenov

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks

Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore,…

Machine Learning · Computer Science 2025-09-01 Bangti Jin , Longjun Wu

On the convergence of gradient descent for two layer neural networks

It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing…

Optimization and Control · Mathematics 2019-11-05 Lei Li

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

A multi-stage deep learning based algorithm for multiscale modelreduction

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…

Numerical Analysis · Mathematics 2020-09-25 Eric Chung , Wing Tat Leung , Sai-Mang Pun , Zecheng Zhang

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Submodular Batch Selection for Training Deep Neural Networks

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation…

Machine Learning · Computer Science 2019-06-21 K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian

Training Neural Networks with Optimal Double-Bayesian Learning

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely…

Machine Learning · Computer Science 2026-05-20 Vy Bui , Hang Yu , Karthik Kantipudi , Ziv Yaniv , Stefan Jaeger

Stochastic Gradient Descent for Nonparametric Additive Regression

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…

Machine Learning · Statistics 2026-01-01 Xin Chen , Jason M. Klusowski

Deep Clustered Convolutional Kernels

Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture…

Machine Learning · Computer Science 2016-03-04 Minyoung Kim , Luca Rigazio