Related papers: Regularizing and Optimizing LSTM Language Models

Neural Networks Compression for Language Modeling

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with…

Machine Learning · Statistics 2019-04-09 Artem M. Grachev , Dmitry I. Ignatov , Andrey V. Savchenko

Recurrent Neural Network Regularization

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In…

Neural and Evolutionary Computing · Computer Science 2015-02-20 Wojciech Zaremba , Ilya Sutskever , Oriol Vinyals

Revisiting Activation Regularization for Language RNNs

Recurrent neural networks (RNNs) serve as a fundamental building block for many sequence tasks across natural language processing. Recent research has focused on recurrent dropout techniques or custom RNN cells in order to improve…

Computation and Language · Computer Science 2017-08-04 Stephen Merity , Bryan McCann , Richard Socher

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic…

Neural and Evolutionary Computing · Computer Science 2014-02-06 Haşim Sak , Andrew Senior , Françoise Beaufays

Improved Language Modeling by Decoding the Past

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next…

Computation and Language · Computer Science 2019-01-25 Siddhartha Brahma

Learning Compact Recurrent Neural Networks

Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile…

Machine Learning · Computer Science 2016-04-12 Zhiyun Lu , Vikas Sindhwani , Tara N. Sainath

On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition

We study the problem of compressing recurrent neural networks (RNNs). In particular, we focus on the compression of RNN acoustic models, which are motivated by the goal of building compact and accurate speech recognition systems which can…

Computation and Language · Computer Science 2016-05-03 Rohit Prabhavalkar , Ouais Alsharif , Antoine Bruguier , Ian McGraw

Language Modeling through Long Term Memory Network

Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can…

Computation and Language · Computer Science 2019-04-22 Anupiya Nugaliyadde , Kok Wai Wong , Ferdous Sohel , Hong Xie

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally…

Machine Learning · Statistics 2017-11-16 Shankar Kumar , Michael Nirschl , Daniel Holtmann-Rice , Hank Liao , Ananda Theertha Suresh , Felix Yu

Depth-Adaptive Graph Recurrent Network for Text Classification

The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network, which views words as nodes and performs layer-wise recurrent steps between them simultaneously. Despite its successes on text representations, the…

Computation and Language · Computer Science 2020-03-03 Yijin Liu , Fandong Meng , Yufeng Chen , Jinan Xu , Jie Zhou

Multi-cell LSTM Based Neural Language Model

Language models, being at the heart of many NLP problems, are always of great interest to researchers. Neural language models come with the advantage of distributed representations and long range contexts. With its particular dynamics that…

Neural and Evolutionary Computing · Computer Science 2018-11-19 Thomas Cherian , Akshay Badola , Vineet Padmanabhan

Return of the RNN: Residual Recurrent Networks for Invertible Sentence Embeddings

This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task. Rather than the probabilistic outputs common to neural machine translation models, our…

Computation and Language · Computer Science 2023-04-07 Jeremy Wilkerson

Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs

Recursive neural networks (RNN) and their recently proposed extension recursive long short term memory networks (RLSTM) are models that compute representations for sentences, by recursively combining word embeddings according to an…

Artificial Intelligence · Computer Science 2016-03-02 Phong Le , Willem Zuidema

Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval

This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long term…

Computation and Language · Computer Science 2016-11-18 Hamid Palangi , Li Deng , Yelong Shen , Jianfeng Gao , Xiaodong He , Jianshu Chen , Xinying Song , Rabab Ward

N-gram Language Modeling using Recurrent Neural Network Estimation

We investigate the effective memory depth of RNN models by using them for $n$-gram language model (LM) smoothing. Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell…

Computation and Language · Computer Science 2017-06-21 Ciprian Chelba , Mohammad Norouzi , Samy Bengio

Restricted Recurrent Neural Networks

Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including…

Computation and Language · Computer Science 2020-05-12 Enmao Diao , Jie Ding , Vahid Tarokh

Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be…

Computer Vision and Pattern Recognition · Computer Science 2018-11-20 Yu Pan , Jing Xu , Maolin Wang , Jinmian Ye , Fei Wang , Kun Bai , Zenglin Xu

High Order Recurrent Neural Networks for Acoustic Modelling

Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the…

Computation and Language · Computer Science 2018-02-26 Chao Zhang , Philip Woodland

Multiplicative LSTM for sequence modelling

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. mLSTM is characterised by…

Neural and Evolutionary Computing · Computer Science 2017-10-13 Ben Krause , Liang Lu , Iain Murray , Steve Renals

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training. Previous work has…

Computation and Language · Computer Science 2020-10-28 Suyoun Kim , Yuan Shangguan , Jay Mahadeokar , Antoine Bruguier , Christian Fuegen , Michael L. Seltzer , Duc Le