Related papers: Neural Language Modeling With Implicit Cache Point…

Information-Weighted Neural Cache Language Models for ASR

Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive…

Computation and Language · Computer Science 2018-09-25 Lyan Verwimp , Joris Pelemans , Hugo Van hamme , Patrick Wambacq

Improving Neural Language Models with a Continuous Cache

We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them…

Computation and Language · Computer Science 2016-12-15 Edouard Grave , Armand Joulin , Nicolas Usunier

Frustratingly Short Attention Spans in Neural Language Modeling

Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been…

Computation and Language · Computer Science 2017-02-16 Michał Daniluk , Tim Rocktäschel , Johannes Welbl , Sebastian Riedel

On using 2D sequence-to-sequence models for speech recognition

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally…

Machine Learning · Statistics 2017-11-16 Shankar Kumar , Michael Nirschl , Daniel Holtmann-Rice , Hank Liao , Ananda Theertha Suresh , Felix Yu

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit…

Computation and Language · Computer Science 2022-03-22 He Bai , Tong Wang , Alessandro Sordoni , Peng Shi

Learning to Remember Translation History with a Continuous Cache

Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight…

Computation and Language · Computer Science 2017-11-28 Zhaopeng Tu , Yang Liu , Shuming Shi , Tong Zhang

Revisiting Simple Neural Probabilistic Language Models

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM)…

Computation and Language · Computer Science 2021-04-09 Simeng Sun , Mohit Iyyer

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

Attending to Characters in Neural Sequence Labeling Models

Sequence labeling architectures use word embeddings for capturing similarity, but suffer when handling previously unseen or rare words. We investigate character-level extensions to such models and propose a novel architecture for combining…

Computation and Language · Computer Science 2016-11-15 Marek Rei , Gamal K. O. Crichton , Sampo Pyysalo

A Comparison of Neural Models for Word Ordering

We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly…

Computation and Language · Computer Science 2017-08-08 Eva Hasler , Felix Stahlberg , Marcus Tomalin , Adri`a de Gispert , Bill Byrne

LaMemo: Language Modeling with Look-Ahead Memory

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model…

Computation and Language · Computer Science 2022-04-27 Haozhe Ji , Rongsheng Zhang , Zhenyu Yang , Zhipeng Hu , Minlie Huang

aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching…

Information Retrieval · Computer Science 2019-06-04 Liu Yang , Qingyao Ai , Jiafeng Guo , W. Bruce Croft

Gated Word-Character Recurrent Language Model

We introduce a recurrent neural network language model (RNN-LM) with long short-term memory (LSTM) units that utilizes both character-level and word-level inputs. Our model has a gate that adaptively finds the optimal mixture of the…

Computation and Language · Computer Science 2016-10-14 Yasumasa Miyamoto , Kyunghyun Cho

Pointing the Unknown Words

The problem of rare and unknown words is an important issue that can potentially influence the performance of many NLP systems, including both the traditional count-based and the deep learning models. We propose a novel way to deal with the…

Computation and Language · Computer Science 2016-08-23 Caglar Gulcehre , Sungjin Ahn , Ramesh Nallapati , Bowen Zhou , Yoshua Bengio

Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Language models (LMs) struggle to perform such reasoning consistently. We propose an approach to pinpoint and rectify multi-hop…

Computation and Language · Computer Science 2024-11-11 Mansi Sakarvadia

Simple linear attention language models balance the recall-throughput tradeoff

Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the…

Computation and Language · Computer Science 2025-03-10 Simran Arora , Sabri Eyuboglu , Michael Zhang , Aman Timalsina , Silas Alberti , Dylan Zinsley , James Zou , Atri Rudra , Christopher Ré

Distil-xLSTM: Learning Attention Mechanisms through Recurrent Structures

The current era of Natural Language Processing (NLP) is dominated by Transformer models. However, novel architectures relying on recurrent mechanisms, such as xLSTM and Mamba, have been proposed as alternatives to attention-based models.…

Machine Learning · Computer Science 2025-03-25 Abdoul Majid O. Thiombiano , Brahim Hnich , Ali Ben Mrad , Mohamed Wiem Mkaouer

Attention-based Memory Selection Recurrent Network for Language Modeling

Recurrent neural networks (RNNs) have achieved great success in language modeling. However, since the RNNs have fixed size of memory, their memory cannot store all the information about the words it have seen before in the sentence, and…

Computation and Language · Computer Science 2016-11-29 Da-Rong Liu , Shun-Po Chuang , Hung-yi Lee

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The…

Computation and Language · Computer Science 2023-01-03 Yukun Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang