Related papers: Noisy Parallel Approximate Decoding for Conditiona…

Parallel Attention Mechanisms in Neural Machine Translation

Recent papers in neural machine translation have proposed the strict use of attention mechanisms over previous standards such as recurrent and convolutional neural networks (RNNs and CNNs). We propose that by running traditionally stacked…

Computation and Language · Computer Science 2018-10-31 Julian Richard Medina , Jugal Kalita

Can neural machine translation do simultaneous translation?

We investigate the potential of attention-based neural machine translation in simultaneous translation. We introduce a novel decoding algorithm, called simultaneous greedy decoding, that allows an existing neural machine translation model…

Computation and Language · Computer Science 2016-06-08 Kyunghyun Cho , Masha Esipova

Contextual Text Denoising with Masked Language Models

Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual…

Computation and Language · Computer Science 2024-03-06 Yifu Sun , Haoming Jiang

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost.…

Computation and Language · Computer Science 2018-08-29 Yun Chen , Victor O. K. Li , Kyunghyun Cho , Samuel R. Bowman

A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do…

Machine Learning · Computer Science 2017-10-10 Kartik Goyal , Graham Neubig , Chris Dyer , Taylor Berg-Kirkpatrick

A Fully Differentiable Beam Search Decoder

We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g.…

Computation and Language · Computer Science 2019-02-19 Ronan Collobert , Awni Hannun , Gabriel Synnaeve

Self-Attentive Residual Decoder for Neural Machine Translation

Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each…

Computation and Language · Computer Science 2018-10-02 Lesly Miculicich Werlen , Nikolaos Pappas , Dhananjay Ram , Andrei Popescu-Belis

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-30 Vladimir Bataev , Andrei Andrusenko , Lilit Grigoryan , Aleksandr Laptev , Vitaly Lavrukhin , Boris Ginsburg

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…

Computation and Language · Computer Science 2019-09-05 Marjan Ghazvininejad , Omer Levy , Yinhan Liu , Luke Zettlemoyer

Trainable Greedy Decoding for Neural Machine Translation

Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research…

Computation and Language · Computer Science 2017-02-09 Jiatao Gu , Kyunghyun Cho , Victor O. K. Li

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit

Efficient Beam Search for Large Language Models Using Trie-Based Decoding

This work presents a novel trie (prefix-tree)-based parallel decoding method that addresses the memory inefficiency of batch-based beam search. By sharing a single KV cache across beams with common prefixes, our approach dramatically…

Computation and Language · Computer Science 2025-09-23 Brian J Chan , MaoXun Huang , Jui-Hung Cheng , Chao-Ting Chen , Hen-Hsen Huang

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

Learning to Look: Cognitive Attention Alignment with Vision-Language Models

Convolutional Neural Networks (CNNs) frequently "cheat" by exploiting superficial correlations, raising concerns about whether they make predictions for the right reasons. Inspired by cognitive science, which highlights the role of…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Ryan L. Yang , Dipkamal Bhusal , Nidhi Rastogi

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences…

Computation and Language · Computer Science 2018-11-05 Seonhoon Kim , Inho Kang , Nojun Kwak

Sequential Causal Discovery with Noisy Language Model Priors

Causal discovery from observational data typically assumes access to complete data and availability of perfect domain experts. In practice, data often arrive in batches, are subject to sampling bias, and expert knowledge is scarce. Language…

Machine Learning · Computer Science 2026-05-12 Prakhar Verma , David Arbour , Sunav Choudhary , Harshita Chopra , Arno Solin , Atanu R. Sinha

Neural Machine Translation with Recurrent Attention Modeling

Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et…

Neural and Evolutionary Computing · Computer Science 2016-07-19 Zichao Yang , Zhiting Hu , Yuntian Deng , Chris Dyer , Alex Smola

Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models

Large language models have shown remarkable performance across a wide range of language tasks, owing to their exceptional capabilities in context modeling. The most commonly used method of context modeling is full self-attention, as seen in…

Computation and Language · Computer Science 2025-06-26 Zhisong Zhang , Yan Wang , Xinting Huang , Tianqing Fang , Hongming Zhang , Chenlong Deng , Shuaiyi Li , Dong Yu

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving…

Machine Learning · Computer Science 2020-10-06 Sean Welleck , Ilia Kulikov , Jaedeok Kim , Richard Yuanzhe Pang , Kyunghyun Cho

The Neural Noisy Channel

We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during…

Computation and Language · Computer Science 2017-03-07 Lei Yu , Phil Blunsom , Chris Dyer , Edward Grefenstette , Tomas Kocisky