Related papers: Learning Sub-Patterns in Piecewise Continuous Func…
A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…
State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…
The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness…
In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural…
We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive…
We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple…
Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to…
Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is…
Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for…
Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing…
In this paper, we develop a novel second-order method for training feed-forward neural nets. At each iteration, we construct a quadratic approximation to the cost function in a low-dimensional subspace. We minimize this approximation inside…
Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore,…
It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing…
When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…
In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…
Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation…
Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely…
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…
Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture…