English
Related papers

Related papers: Learning Sub-Patterns in Piecewise Continuous Func…

200 papers

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness…

Machine Learning · Computer Science 2024-06-18 Thomas Flynn

In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural…

Machine Learning · Computer Science 2019-05-29 Joseph Daws , Clayton G. Webster

We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive…

Machine Learning · Computer Science 2024-10-11 Semih Cayci , Atilla Eryilmaz

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple…

Machine Learning · Computer Science 2019-11-06 Qiang Liu , Lemeng Wu , Dilin Wang

Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Dongchen Han , Tianzhu Ye , Zhuofan Xia , Kaiyi Chen , Yulin Wang , Hanting Chen , Gao Huang

Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is…

Machine Learning · Computer Science 2025-09-16 Akshay Vegesna , Samip Dahal

Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for…

Machine Learning · Computer Science 2023-01-03 Morgane Ayle , Jan Schuchardt , Lukas Gosch , Daniel Zügner , Stephan Günnemann

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing…

Machine Learning · Computer Science 2026-05-20 Ha Dang , Sebastian Schmidt , Juergen Hesser

In this paper, we develop a novel second-order method for training feed-forward neural nets. At each iteration, we construct a quadratic approximation to the cost function in a low-dimensional subspace. We minimize this approximation inside…

Computer Vision and Pattern Recognition · Computer Science 2018-05-25 Viacheslav Dudar , Giovanni Chierchia , Emilie Chouzenoux , Jean-Christophe Pesquet , Vladimir Semenov

Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore,…

Machine Learning · Computer Science 2025-09-01 Bangti Jin , Longjun Wu

It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing…

Optimization and Control · Mathematics 2019-11-05 Lei Li

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…

Numerical Analysis · Mathematics 2020-09-25 Eric Chung , Wing Tat Leung , Sai-Mang Pun , Zecheng Zhang

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation…

Machine Learning · Computer Science 2019-06-21 K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely…

Machine Learning · Computer Science 2026-05-20 Vy Bui , Hang Yu , Karthik Kantipudi , Ziv Yaniv , Stefan Jaeger

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…

Machine Learning · Statistics 2026-01-01 Xin Chen , Jason M. Klusowski

Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture…

Machine Learning · Computer Science 2016-03-04 Minyoung Kim , Luca Rigazio
‹ Prev 1 2 3 10 Next ›