Related papers: Understanding How Encoder-Decoder Architectures At…

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint…

Neural and Evolutionary Computing · Computer Science 2016-11-15 Kyunghyun Cho , Aaron Courville , Yoshua Bengio

Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition

Deep neural networks are largely used for complex prediction tasks. There is plenty of empirical evidence of their successful end-to-end training for a diversity of tasks. Success is often measured based solely on the final performance of…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Sergio Y. Hayashi , Nina S. T. Hirata

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention…

Computation and Language · Computer Science 2018-11-02 Maha Elbayad , Laurent Besacier , Jakob Verbeek

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to…

Computation and Language · Computer Science 2023-04-11 Zihao Fu , Wai Lam , Qian Yu , Anthony Man-Cho So , Shengding Hu , Zhiyuan Liu , Nigel Collier

Focused Hierarchical RNNs for Conditional Sequence Processing

Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and…

Machine Learning · Statistics 2018-06-13 Nan Rosemary Ke , Konrad Zolna , Alessandro Sordoni , Zhouhan Lin , Adam Trischler , Yoshua Bengio , Joelle Pineau , Laurent Charlin , Chris Pal

Understanding Matching Mechanisms in Cross-Encoders

Neural IR architectures, particularly cross-encoders, are highly effective models whose internal mechanisms are mostly unknown. Most works trying to explain their behavior focused on high-level processes (e.g., what in the input influences…

Information Retrieval · Computer Science 2025-07-22 Mathias Vast , Basile Van Cooten , Laure Soulier , Benjamin Piwowarski

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target…

Computation and Language · Computer Science 2017-11-06 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Attention-based Neural Load Forecasting: A Dynamic Feature Selection Approach

Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network…

Machine Learning · Computer Science 2023-05-10 Jing Xiong , Pengyang Zhou , Alan Chen , Yu Zhang

How Transformers Learn Causal Structure with Gradient Descent

The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows…

Machine Learning · Computer Science 2024-08-14 Eshaan Nichani , Alex Damian , Jason D. Lee

Contextually Structured Token Dependency Encoding for Large Language Models

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Hila Chefer , Shir Gur , Lior Wolf

Input Combination Strategies for Multi-Source Transformer Decoder

In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder…

Computation and Language · Computer Science 2018-11-13 Jindřich Libovický , Jindřich Helcl , David Mareček

Encoding-based Memory Modules for Recurrent Neural Networks

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design…

Machine Learning · Computer Science 2020-02-03 Antonio Carta , Alessandro Sperduti , Davide Bacciu

Modelling Sentence Pairs with Tree-structured Attentive Encoder

We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we…

Computation and Language · Computer Science 2016-10-11 Yao Zhou , Cong Liu , Yan Pan

Towards a universal neural network encoder for time series

We study the use of a time series encoder to learn representations that are useful on data set types with which it has not been trained on. The encoder is formed of a convolutional neural network whose temporal output is summarized by a…

Machine Learning · Computer Science 2018-05-11 Joan Serrà , Santiago Pascual , Alexandros Karatzoglou

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the…

Computation and Language · Computer Science 2018-07-24 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Graph-based Neural Modules to Inspect Attention-based Architectures: A Position Paper

Encoder-decoder architectures are prominent building blocks of state-of-the-art solutions for tasks across multiple fields where deep learning (DL) or foundation models play a key role. Although there is a growing community working on the…

Machine Learning · Computer Science 2022-10-14 Breno W. Carvalho , Artur D'Avilla Garcez , Luis C. Lamb

Is Encoder-Decoder Redundant for Neural Machine Translation?

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of…

Computation and Language · Computer Science 2022-10-24 Yingbo Gao , Christian Herold , Zijian Yang , Hermann Ney

Structured Attention Networks

Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training.…

Computation and Language · Computer Science 2017-02-17 Yoon Kim , Carl Denton , Luong Hoang , Alexander M. Rush