Related papers: Decoder-Only or Encoder-Decoder? Interpreting Lang…

Understanding How Encoder-Decoder Architectures Attend

Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However,…

Machine Learning · Computer Science 2021-10-29 Kyle Aitken , Vinay V Ramasesh , Yuan Cao , Niru Maheswaranathan

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner

Joint Copying and Restricted Generation for Paraphrase

Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq)…

Computation and Language · Computer Science 2016-11-29 Ziqiang Cao , Chuwei Luo , Wenjie Li , Sujian Li

Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder…

Computation and Language · Computer Science 2016-04-19 Allen Schmaltz , Yoon Kim , Alexander M. Rush , Stuart M. Shieber

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention…

Computation and Language · Computer Science 2018-11-02 Maha Elbayad , Laurent Besacier , Jakob Verbeek

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last…

Computation and Language · Computer Science 2022-08-30 Fenglin Liu , Xuancheng Ren , Guangxiang Zhao , Chenyu You , Xuewei Ma , Xian Wu , Xu Sun

Is Encoder-Decoder Redundant for Neural Machine Translation?

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of…

Computation and Language · Computer Science 2022-10-24 Yingbo Gao , Christian Herold , Zijian Yang , Hermann Ney

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target…

Computation and Language · Computer Science 2017-11-06 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Regularized Forward-Backward Decoder for Attention Models

Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Tobias Watzel , Ludwig Kürzinger , Lujun Li , Gerhard Rigoll

Prior Attention for Style-aware Sequence-to-Sequence Models

We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial…

Computation and Language · Computer Science 2018-06-26 Lucas Sterckx , Johannes Deleu , Chris Develder , Thomas Demeester

Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token

Decoder-only large language models (LLMs) have been increasingly adopted to build embedding models for diverse tasks. To overcome the inherent limitations of causal attention in representation learning, many existing methods modify the…

Computation and Language · Computer Science 2026-05-05 Ailiang Lin , Zhuoyun Li , Yusong Wang , Kotaro Funakoshi , Manabu Okumura

Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks

Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in…

Computation and Language · Computer Science 2025-03-05 Paul Suganthan , Fedor Moiseev , Le Yan , Junru Wu , Jianmo Ni , Jay Han , Imed Zitouni , Enrique Alfonseca , Xuanhui Wang , Zhe Dong

Investigating Linguistic Pattern Ordering in Hierarchical Natural Language Generation

Natural language generation (NLG) is a critical component in spoken dialogue system, which can be divided into two phases: (1) sentence planning: deciding the overall sentence structure, (2) surface realization: determining specific word…

Computation and Language · Computer Science 2018-09-21 Shang-Yu Su , Yun-Nung Chen

Self-Attentive Residual Decoder for Neural Machine Translation

Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each…

Computation and Language · Computer Science 2018-10-02 Lesly Miculicich Werlen , Nikolaos Pappas , Dhananjay Ram , Andrei Popescu-Belis

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP)…

Computation and Language · Computer Science 2019-11-07 Xindian Ma , Peng Zhang , Shuai Zhang , Nan Duan , Yuexian Hou , Dawei Song , Ming Zhou

Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction

Automatic spelling and grammatical correction systems are one of the most widely used tools within natural language applications. In this thesis, we assume the task of error correction as a type of monolingual machine translation where the…

Computation and Language · Computer Science 2018-10-02 Sina Ahmadi

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder for end-to-end speech recognition as an extension of a multi-head attention model. In the multi-head attention model, multiple attentions are calculated, and then,…

Computation and Language · Computer Science 2018-07-31 Tomoki Hayashi , Shinji Watanabe , Tomoki Toda , Kazuya Takeda

Joint Source-Target Self Attention with Locality Constraints

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks…

Computation and Language · Computer Science 2019-05-17 José A. R. Fonollosa , Noe Casas , Marta R. Costa-jussà