English
Related papers

Related papers: Decoder-Only or Encoder-Decoder? Interpreting Lang…

200 papers

Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However,…

Machine Learning · Computer Science 2021-10-29 Kyle Aitken , Vinay V Ramasesh , Yuan Cao , Niru Maheswaranathan

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner

Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq)…

Computation and Language · Computer Science 2016-11-29 Ziqiang Cao , Chuwei Luo , Wenjie Li , Sujian Li

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder…

Computation and Language · Computer Science 2016-04-19 Allen Schmaltz , Yoon Kim , Alexander M. Rush , Stuart M. Shieber

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention…

Computation and Language · Computer Science 2018-11-02 Maha Elbayad , Laurent Besacier , Jakob Verbeek

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last…

Computation and Language · Computer Science 2022-08-30 Fenglin Liu , Xuancheng Ren , Guangxiang Zhao , Chenyu You , Xuewei Ma , Xian Wu , Xu Sun

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of…

Computation and Language · Computer Science 2022-10-24 Yingbo Gao , Christian Herold , Zijian Yang , Hermann Ney

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target…

Computation and Language · Computer Science 2017-11-06 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Tobias Watzel , Ludwig Kürzinger , Lujun Li , Gerhard Rigoll

We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial…

Computation and Language · Computer Science 2018-06-26 Lucas Sterckx , Johannes Deleu , Chris Develder , Thomas Demeester

Decoder-only large language models (LLMs) have been increasingly adopted to build embedding models for diverse tasks. To overcome the inherent limitations of causal attention in representation learning, many existing methods modify the…

Computation and Language · Computer Science 2026-05-05 Ailiang Lin , Zhuoyun Li , Yusong Wang , Kotaro Funakoshi , Manabu Okumura

Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in…

Computation and Language · Computer Science 2025-03-05 Paul Suganthan , Fedor Moiseev , Le Yan , Junru Wu , Jianmo Ni , Jay Han , Imed Zitouni , Enrique Alfonseca , Xuanhui Wang , Zhe Dong

Natural language generation (NLG) is a critical component in spoken dialogue system, which can be divided into two phases: (1) sentence planning: deciding the overall sentence structure, (2) surface realization: determining specific word…

Computation and Language · Computer Science 2018-09-21 Shang-Yu Su , Yun-Nung Chen

Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each…

Computation and Language · Computer Science 2018-10-02 Lesly Miculicich Werlen , Nikolaos Pappas , Dhananjay Ram , Andrei Popescu-Belis

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP)…

Computation and Language · Computer Science 2019-11-07 Xindian Ma , Peng Zhang , Shuai Zhang , Nan Duan , Yuexian Hou , Dawei Song , Ming Zhou

Automatic spelling and grammatical correction systems are one of the most widely used tools within natural language applications. In this thesis, we assume the task of error correction as a type of monolingual machine translation where the…

Computation and Language · Computer Science 2018-10-02 Sina Ahmadi

This paper presents a new network architecture called multi-head decoder for end-to-end speech recognition as an extension of a multi-head attention model. In the multi-head attention model, multiple attentions are calculated, and then,…

Computation and Language · Computer Science 2018-07-31 Tomoki Hayashi , Shinji Watanabe , Tomoki Toda , Kazuya Takeda

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks…

Computation and Language · Computer Science 2019-05-17 José A. R. Fonollosa , Noe Casas , Marta R. Costa-jussà
‹ Prev 1 2 3 10 Next ›