Related papers: Pointer Sentinel Mixture Models

Mogrifier LSTM

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the…

Computation and Language · Computer Science 2020-01-30 Gábor Melis , Tomáš Kočiský , Phil Blunsom

Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant…

Computation and Language · Computer Science 2023-05-23 Haw-Shiuan Chang , Zonghai Yao , Alolika Gon , Hong Yu , Andrew McCallum

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is…

Computation and Language · Computer Science 2018-03-06 Zhilin Yang , Zihang Dai , Ruslan Salakhutdinov , William W. Cohen

Characterizing Verbatim Short-Term Memory in Neural Language Models

When a language model is trained to predict natural language sequences, its prediction at each moment depends on a representation of prior context. What kind of information about the prior context can language models retrieve? We tested…

Computation and Language · Computer Science 2023-05-03 Kristijan Armeni , Christopher Honey , Tal Linzen

Long-span language modeling for speech recognition

We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the…

Computation and Language · Computer Science 2019-11-13 Sarangarajan Parthasarathy , William Gale , Xie Chen , George Polovets , Shuangyu Chang

Investigation of Large-Margin Softmax in Neural Language Modeling

To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-22 Jingjing Huo , Yingbo Gao , Weiyue Wang , Ralf Schlüter , Hermann Ney

Multi-cell LSTM Based Neural Language Model

Language models, being at the heart of many NLP problems, are always of great interest to researchers. Neural language models come with the advantage of distributed representations and long range contexts. With its particular dynamics that…

Neural and Evolutionary Computing · Computer Science 2018-11-19 Thomas Cherian , Akshay Badola , Vineet Padmanabhan

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Neural language models have been widely used in various NLP tasks, including machine translation, next word prediction and conversational agents. However, it is challenging to deploy these models on mobile devices due to their slow…

Machine Learning · Computer Science 2018-10-31 Patrick H. Chen , Si Si , Sanjiv Kumar , Yang Li , Cho-Jui Hsieh

Pointing the Unknown Words

The problem of rare and unknown words is an important issue that can potentially influence the performance of many NLP systems, including both the traditional count-based and the deep learning models. We propose a novel way to deal with the…

Computation and Language · Computer Science 2016-08-23 Caglar Gulcehre , Sungjin Ahn , Ramesh Nallapati , Bowen Zhou , Yoshua Bengio

Revisiting Simple Neural Probabilistic Language Models

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM)…

Computation and Language · Computer Science 2021-04-09 Simeng Sun , Mohit Iyyer

Neural Language Modeling With Implicit Cache Pointers

A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. This approach is a simpler alternative to attention-based pointer mechanism that…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-30 Ke Li , Daniel Povey , Sanjeev Khudanpur

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of…

Computation and Language · Computer Science 2015-06-02 Kai Sheng Tai , Richard Socher , Christopher D. Manning

Nonparametric Masked Language Modeling

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax…

Computation and Language · Computer Science 2023-05-29 Sewon Min , Weijia Shi , Mike Lewis , Xilun Chen , Wen-tau Yih , Hannaneh Hajishirzi , Luke Zettlemoyer

Learning to Represent Words in Context with Multilingual Supervision

We present a neural network architecture based on bidirectional LSTMs to compute representations of words in the sentential contexts. These context-sensitive word representations are suitable for, e.g., distinguishing different word senses…

Computation and Language · Computer Science 2015-11-23 Kazuya Kawakami , Chris Dyer

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an…

Computation and Language · Computer Science 2022-08-08 Vilém Zouhar , Marius Mosbach , Dietrich Klakow

Long-Short Range Context Neural Networks for Language Modeling

The goal of language modeling techniques is to capture the statistical and structural properties of natural languages from training corpora. This task typically involves the learning of short range dependencies, which generally model the…

Computation and Language · Computer Science 2017-08-23 Youssef Oualil , Mittul Singh , Clayton Greenberg , Dietrich Klakow

Structured Language Modeling for Speech Recognition

A new language model for speech recognition is presented. The model develops hidden hierarchical syntactic-like structure incrementally and uses it to extract meaningful information from the word history, thus complementing the locality of…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba , Frederick Jelinek

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level discourse information into language modelling. We call this larger-context language model. We introduce a late fusion approach to a recurrent language model based on long…

Computation and Language · Computer Science 2015-12-29 Tian Wang , Kyunghyun Cho

A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling

Statistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in…

Computation and Language · Computer Science 2016-02-05 Anantharaman Palacode Narayana Iyer

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of…

Computation and Language · Computer Science 2024-04-12 Nathan Godey , Éric de la Clergerie , Benoît Sagot