Related papers: Multilevel Initialization for Layer-Parallel Deep …

Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification

Recurrent Neural Networks (RNNs) can be seriously impacted by the initial parameters assignment, which may result in poor generalization performances on new unseen data. With the objective to tackle this crucial issue, in the context of RNN…

Machine Learning · Computer Science 2019-11-05 Dino Ienco , Roberto Interdonato , Raffaele Gaetano

Multilevel-in-Layer Training for Deep Neural Network Regression

A common challenge in regression is that for many problems, the degrees of freedom required for a high-quality solution also allows for overfitting. Regularization is a class of strategies that seek to restrict the range of possible…

Machine Learning · Computer Science 2022-11-15 Colin Ponce , Ruipeng Li , Christina Mao , Panayot Vassilevski

Multilevel Minimization for Deep Residual Networks

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint,…

Machine Learning · Computer Science 2020-04-15 Lisa Gaedke-Merzhäuser , Alena Kopaničáková , Rolf Krause

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be…

Optimization and Control · Mathematics 2019-07-26 S. Günther , L. Ruthotto , J. B. Schroder , E. C. Cyr , N. R. Gauger

A multi-stage deep learning based algorithm for multiscale modelreduction

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…

Numerical Analysis · Mathematics 2020-09-25 Eric Chung , Wing Tat Leung , Sai-Mang Pun , Zecheng Zhang

Initialization and Regularization of Factorized Neural Layers

Factorized layers--operations parameterized by products of two or more matrices--occur in a variety of deep learning contexts, including compressed model training, certain types of knowledge distillation, and multi-head self-attention…

Machine Learning · Statistics 2022-10-07 Mikhail Khodak , Neil Tenenholtz , Lester Mackey , Nicolò Fusi

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

A Multi-Level Deep Framework for Deep Solvers of Partial Differential Equations

In this paper, inspired by the multigrid method, we propose a multi-level deep framework for deep solvers. Overall, it divides the entire training process into different levels of training. At each level of training, an adaptive sampling…

Numerical Analysis · Mathematics 2026-02-23 Yu Yang , Qiaolin He

Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Deep neural networks are capable of modelling highly non-linear functions by capturing different levels of abstraction of data hierarchically. While training deep networks, first the system is initialized near a good optimum by greedy…

Machine Learning · Computer Science 2016-03-10 Anirban Santara , Debapriya Maji , DP Tejas , Pabitra Mitra , Arobinda Gupta

Reducing Neural Network Parameter Initialization Into an SMT Problem

Training a neural network (NN) depends on multiple factors, including but not limited to the initial weights. In this paper, we focus on initializing deep NN parameters such that it performs better, comparing to random or zero…

Machine Learning · Computer Science 2020-11-10 Mohamad H. Danesh

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that…

Machine Learning · Computer Science 2023-02-10 Cheolhyoung Lee , Kyunghyun Cho

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

Depth-Adaptive Neural Networks from the Optimal Control viewpoint

In recent years, deep learning has been connected with optimal control as a way to define a notion of a continuous underlying learning problem. In this view, neural networks can be interpreted as a discretization of a parametric Ordinary…

Optimization and Control · Mathematics 2020-07-07 Joubine Aghili , Olga Mula

Regularizing deep networks using efficient layerwise adversarial training

Adversarial training has been shown to regularize deep neural networks in addition to increasing their robustness to adversarial examples. However, its impact on very deep state of the art networks has not been fully investigated. In this…

Computer Vision and Pattern Recognition · Computer Science 2018-05-30 Swami Sankaranarayanan , Arpit Jain , Rama Chellappa , Ser Nam Lim

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models…

Machine Learning · Computer Science 2021-12-21 Karl Hajjar , Lénaïc Chizat , Christophe Giraud

Early alignment in two-layer networks training is a two-edged sword

Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning. The scale of initialisation is a crucial factor, as small initialisations are generally associated to a feature…

Machine Learning · Computer Science 2025-09-16 Etienne Boursier , Nicolas Flammarion

Regularizing Deep Networks by Modeling and Predicting Label Structure

We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an…

Computer Vision and Pattern Recognition · Computer Science 2018-04-09 Mohammadreza Mostajabi , Michael Maire , Gregory Shakhnarovich

Multi-Path Learnable Wavelet Neural Network for Image Classification

Despite the remarkable success of deep learning in pattern recognition, deep network models face the problem of training a large number of parameters. In this paper, we propose and evaluate a novel multi-path wavelet neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-08-27 D. D. N. De Silva , H. W. M. K. Vithanage , K. S. D. Fernando , I. T. S. Piyatilake

Faster Predictive Coding Networks via Better Initialization

Research aimed at scaling up neuroscience inspired learning algorithms for neural networks is accelerating. Recently, a key research area has been the study of energy-based learning algorithms such as predictive coding, due to their…

Machine Learning · Computer Science 2026-01-30 Luca Pinchetti , Simon Frieder , Thomas Lukasiewicz , Tommaso Salvatori