Related papers: Regularizing RNNs by Stabilizing Activations

Revisiting Activation Regularization for Language RNNs

Recurrent neural networks (RNNs) serve as a fundamental building block for many sequence tasks across natural language processing. Recent research has focused on recurrent dropout techniques or custom RNN cells in order to improve…

Computation and Language · Computer Science 2017-08-04 Stephen Merity , Bryan McCann , Richard Socher

Regularizing Neural Networks by Penalizing Confident Output Distributions

We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as…

Neural and Evolutionary Computing · Computer Science 2017-01-24 Gabriel Pereyra , George Tucker , Jan Chorowski , Łukasz Kaiser , Geoffrey Hinton

Regularizing and Optimizing LSTM Language Models

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this…

Computation and Language · Computer Science 2017-08-09 Stephen Merity , Nitish Shirish Keskar , Richard Socher

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving…

Neural and Evolutionary Computing · Computer Science 2017-09-26 David Krueger , Tegan Maharaj , János Kramár , Mohammad Pezeshki , Nicolas Ballas , Nan Rosemary Ke , Anirudh Goyal , Yoshua Bengio , Aaron Courville , Chris Pal

Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods

Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by…

Machine Learning · Statistics 2014-10-22 Saahil Ognawala , Justin Bayer

Noisy Recurrent Neural Networks

We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by…

Machine Learning · Statistics 2021-12-02 Soon Hoe Lim , N. Benjamin Erichson , Liam Hodgkinson , Michael W. Mahoney

Understanding and Controlling Memory in Recurrent Neural Networks

To be effective in sequential data processing, Recurrent Neural Networks (RNNs) are required to keep track of past events by creating memories. While the relation between memories and the network's hidden state dynamics was established over…

Machine Learning · Computer Science 2019-09-17 Doron Haviv , Alexander Rivkind , Omri Barak

Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its…

Computation and Language · Computer Science 2018-06-20 Jan Vanek , Josef Michalek , Josef Psutka

A Dynamic Penalty Function Approach for Constraints-Handling in Reinforcement Learning

Reinforcement learning (RL) is attracting attention as an effective way to solve sequential optimization problems that involve high dimensional state/action space and stochastic uncertainties. Many such problems involve constraints…

Machine Learning · Computer Science 2021-04-01 Haeun Yoo , Victor M. Zavala , Jay H. Lee

Infinity-norm-based Input-to-State-Stable Long Short-Term Memory networks: a thermal systems perspective

Recurrent Neural Networks (RNNs) have shown remarkable performances in system identification, particularly in nonlinear dynamical systems such as thermal processes. However, stability remains a critical challenge in practical applications:…

Optimization and Control · Mathematics 2025-10-17 Stefano De Carli , Davide Previtali , Leandro Pitturelli , Mirko Mazzoleni , Antonio Ferramosca , Fabio Previdi

Occam's Gates

We present a complimentary objective for training recurrent neural networks (RNN) with gating units that helps with regularization and interpretability of the trained model. Attention-based RNN models have shown success in many difficult…

Machine Learning · Computer Science 2015-06-30 Jonathan Raiman , Szymon Sidor

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have…

Machine Learning · Computer Science 2023-03-14 Antonio Orvieto , Samuel L Smith , Albert Gu , Anushan Fernando , Caglar Gulcehre , Razvan Pascanu , Soham De

Adversarial Learning with Margin-based Triplet Embedding Regularization

The Deep neural networks (DNNs) have achieved great success on a variety of computer vision tasks, however, they are highly vulnerable to adversarial attacks. To address this problem, we propose to improve the local smoothness of the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-23 Yaoyao Zhong , Weihong Deng

Regularized deep learning with nonconvex penalties

Regularization methods are often employed in deep learning neural networks (DNNs) to prevent overfitting. For penalty based DNN regularization methods, convex penalties are typically considered because of their optimization guarantees.…

Machine Learning · Statistics 2022-04-07 Sujit Vettam , Majnu John

Regularization and nonlinearities for neural language models: when are they needed?

Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successful word and character-level LMs. Why do they work so well, in particular better than linear neural LMs? Possible explanations are that RNNs…

Machine Learning · Statistics 2013-06-21 Marius Pachitariu , Maneesh Sahani

Noise Stability Regularization for Improving BERT Fine-tuning

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of…

Computation and Language · Computer Science 2021-07-13 Hang Hua , Xingjian Li , Dejing Dou , Cheng-Zhong Xu , Jiebo Luo

Recurrent Neural Network Regularization

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In…

Neural and Evolutionary Computing · Computer Science 2015-02-20 Wojciech Zaremba , Ilya Sutskever , Oriol Vinyals

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing…

Machine Learning · Computer Science 2024-10-02 Rossen Nenov , Daniel Haider , Peter Balazs

Consistency of Neural Networks with Regularization

Neural networks have attracted a lot of attention due to its success in applications such as natural language processing and computer vision. For large scale data, due to the tremendous number of parameters in neural networks, overfitting…

Machine Learning · Statistics 2022-07-05 Xiaoxi Shen , Jinghang Lin

Regularized Binary Network Training

There is a significant performance gap between Binary Neural Networks (BNNs) and floating point Deep Neural Networks (DNNs). We propose to improve the binary training method, by introducing a new regularization function that encourages…

Machine Learning · Computer Science 2020-04-22 Sajad Darabi , Mouloud Belbahri , Matthieu Courbariaux , Vahid Partovi Nia