Related papers: Triangular Dropout: Variable Network Width without…

Adaptive Width Neural Networks

For almost 70 years, researchers have typically selected the width of neural networks' layers either manually or through automated hyperparameter tuning methods such as grid search and, more recently, neural architecture search. This paper…

Machine Learning · Computer Science 2026-02-17 Federico Errica , Henrik Christiansen , Viktor Zaverkin , Mathias Niepert , Francesco Alesiani

Structural Dropout for Model Width Compression

Existing ML models are known to be highly over-parametrized, and use significantly more resources than required for a given task. Prior work has explored compressing models offline, such as by distilling knowledge from larger models into…

Machine Learning · Computer Science 2022-05-17 Julian Knodt

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of…

Machine Learning · Computer Science 2019-09-26 Angela Fan , Edouard Grave , Armand Joulin

Reflash Dropout in Image Super-Resolution

Dropout is designed to relieve the overfitting problem in high-level vision tasks but is rarely applied in low-level vision tasks, like image super-resolution (SR). As a classic regression problem, SR exhibits a different behaviour as…

Computer Vision and Pattern Recognition · Computer Science 2022-04-21 Xiangtao Kong , Xina Liu , Jinjin Gu , Yu Qiao , Chao Dong

Learning Sparse Networks Using Targeted Dropout

Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away…

Machine Learning · Computer Science 2019-09-10 Aidan N. Gomez , Ivan Zhang , Siddhartha Rao Kamalakara , Divyam Madaan , Kevin Swersky , Yarin Gal , Geoffrey E. Hinton

Shakeout: A New Approach to Regularized Deep Neural Network Training

Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training.…

Computer Vision and Pattern Recognition · Computer Science 2019-04-16 Guoliang Kang , Jun Li , Dacheng Tao

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Neural networks are often over-parameterized and hence benefit from aggressive regularization. Conventional regularization methods, such as Dropout or weight decay, do not leverage the structures of the network's inputs and hidden states.…

Machine Learning · Computer Science 2021-01-07 Hieu Pham , Quoc V. Le

Gradual DropIn of Layers to Train Very Deep Neural Networks

We introduce the concept of dynamically growing a neural network during training. In particular, an untrainable deep network starts as a trainable shallow network and newly added layers are slowly, organically added during training, thereby…

Neural and Evolutionary Computing · Computer Science 2015-11-24 Leslie N. Smith , Emily M. Hand , Timothy Doster

Guided Dropout

Dropout is often used in deep neural networks to prevent over-fitting. Conventionally, dropout training invokes \textit{random drop} of nodes from the hidden layers of a Neural Network. It is our hypothesis that a guided selection of nodes…

Machine Learning · Computer Science 2018-12-11 Rohit Keshari , Richa Singh , Mayank Vatsa

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type,…

Machine Learning · Computer Science 2017-03-02 Vamsi K Ithapu , Sathya N Ravi , Vikas Singh

On the interplay of network structure and gradient convergence in deep learning

The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep…

Machine Learning · Computer Science 2017-02-23 Vamsi K Ithapu , Sathya N Ravi , Vikas Singh

Multi-Sample Dropout for Accelerated Training and Better Generalization

Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the…

Neural and Evolutionary Computing · Computer Science 2020-10-22 Hiroshi Inoue

Data Dropout in Arbitrary Basis for Deep Network Regularization

An important problem in training deep networks with high capacity is to ensure that the trained network works well when presented with new inputs outside the training dataset. Dropout is an effective regularization technique to boost the…

Computer Vision and Pattern Recognition · Computer Science 2017-12-06 Mostafa Rahmani , George Atia

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Recurrent neural networks (RNNs) with Long Short-Term memory cells currently hold the best known results in unconstrained handwriting recognition. We show that their performance can be greatly improved using dropout - a recently proposed…

Computer Vision and Pattern Recognition · Computer Science 2014-03-11 Vu Pham , Théodore Bluche , Christopher Kermorvant , Jérôme Louradour

Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime

Dropout is a standard training technique for neural networks that consists of randomly deactivating units at each step of their gradient-based training. It is known to improve performance in many settings, including in the large-scale…

Machine Learning · Computer Science 2025-10-10 Lénaïc Chizat , Pierre Marion , Yerkin Yesbay

Variational Nested Dropout

Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets: the nested nets are neural…

Machine Learning · Computer Science 2022-06-20 Yufei Cui , Yu Mao , Ziquan Liu , Qiao Li , Antoni B. Chan , Xue Liu , Tei-Wei Kuo , Chun Jason Xue

Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies

Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training. This paper introduces a novel methodology that assigns dynamic drop rates…

Machine Learning · Computer Science 2025-02-28 Yuan-Chih Yang , Hung-Hsuan Chen

Neuron-Specific Dropout: A Deterministic Regularization Technique to Prevent Neural Networks from Overfitting & Reduce Dependence on Large Training Samples

In order to develop complex relationships between their inputs and outputs, deep neural networks train and adjust large number of parameters. To make these networks work at high accuracy, vast amounts of data are needed. Sometimes, however,…

Machine Learning · Computer Science 2022-01-19 Joshua Shunk

Learning Compact Convolutional Neural Networks with Nested Dropout

Recently, nested dropout was proposed as a method for ordering representation units in autoencoders by their information content, without diminishing reconstruction cost. However, it has only been applied to training fully-connected…

Computer Vision and Pattern Recognition · Computer Science 2015-04-13 Chelsea Finn , Lisa Anne Hendricks , Trevor Darrell

Curriculum Dropout

Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network…

Neural and Evolutionary Computing · Computer Science 2017-08-04 Pietro Morerio , Jacopo Cavazza , Riccardo Volpi , Rene Vidal , Vittorio Murino