Related papers: A Deep Memory-based Architecture for Sequence-to-S…

Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modeling. In the state-of-the-art…

Computation and Language · Computer Science 2018-10-10 Parnia Bahar , Christopher Brix , Hermann Ney

A Hierarchical Neural Network for Sequence-to-Sequences Learning

In recent years, the sequence-to-sequence learning neural networks with attention mechanism have achieved great progress. However, there are still challenges, especially for Neural Machine Translation (NMT), such as lower translation…

Computation and Language · Computer Science 2018-11-26 Si Zuo , Zhimin Xu

Structured Memory based Deep Model to Detect as well as Characterize Novel Inputs

While deep learning has pushed the boundaries in various machine learning tasks, the current models are still far away from replicating many functions that a normal human brain can do. Explicit memorization based deep architecture have been…

Computer Vision and Pattern Recognition · Computer Science 2018-01-31 Pratik Prabhanjan Brahma , Qiuyuan Huang , Dapeng Wu

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

This tutorial introduces a new and powerful set of techniques variously called "neural machine translation" or "neural sequence-to-sequence models". These techniques have been used in a number of tasks regarding the handling of human…

Computation and Language · Computer Science 2017-03-07 Graham Neubig

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it…

Computation and Language · Computer Science 2017-06-13 Ron J. Weiss , Jan Chorowski , Navdeep Jaitly , Yonghui Wu , Zhifeng Chen

Memory and attention in deep learning

Intelligence necessitates memory. Without memory, humans fail to perform various nontrivial tasks such as reading novels, playing games or solving maths. As the ultimate goal of machine learning is to derive intelligent systems that learn…

Machine Learning · Computer Science 2021-07-06 Hung Le

Convolutional Sequence to Sequence Learning

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to…

Computation and Language · Computer Science 2017-07-26 Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , Yann N. Dauphin

Opening the black box of language acquisition

Recent advances in large language models using deep learning techniques have renewed interest on how languages can be learned from data. However, it is unclear whether or how these models represent grammatical information from the learned…

Computation and Language · Computer Science 2024-02-20 Jérôme Michaud , Anna Jon-and

Sequence to Sequence Learning with Neural Networks

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to…

Computation and Language · Computer Science 2014-12-16 Ilya Sutskever , Oriol Vinyals , Quoc V. Le

Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search

Past work has long recognized the important role of context in guiding how humans search their memory. While context-based memory models can explain many memory phenomena, it remains unclear why humans develop such architectures over…

Neurons and Cognition · Quantitative Biology 2025-06-24 Nikolaus Salvatore , Qiong Zhang

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

In sequence to sequence learning, the self-attention mechanism proves to be highly effective, and achieves significant improvements in many tasks. However, the self-attention mechanism is not without its own flaws. Although self-attention…

Computation and Language · Computer Science 2019-11-22 Guangxiang Zhao , Xu Sun , Jingjing Xu , Zhiyuan Zhang , Liangchen Luo

Concept Learning through Deep Reinforcement Learning with Memory-Augmented Neural Networks

Deep neural networks have shown superior performance in many regimes to remember familiar patterns with large amounts of data. However, the standard supervised deep learning paradigm is still limited when facing the need to learn new…

Machine Learning · Computer Science 2018-11-16 Jing Shi , Jiaming Xu , Yiqun Yao , Bo Xu

Long Short-Term Memory-Networks for Machine Reading

In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning…

Computation and Language · Computer Science 2016-09-22 Jianpeng Cheng , Li Dong , Mirella Lapata

A Neural Architecture Mimicking Humans End-to-End for Natural Language Inference

In this work we use the recent advances in representation learning to propose a neural architecture for the problem of natural language inference. Our approach is aligned to mimic how a human does the natural language inference process…

Computation and Language · Computer Science 2017-01-30 Biswajit Paria , K. M. Annervaz , Ambedkar Dukkipati , Ankush Chatterjee , Sanjay Podder

Extending Memory for Language Modelling

Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the…

Computation and Language · Computer Science 2023-05-22 Anupiya Nugaliyadde

Learning to Transduce with Unbounded Memory

Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit…

Neural and Evolutionary Computing · Computer Science 2015-11-04 Edward Grefenstette , Karl Moritz Hermann , Mustafa Suleyman , Phil Blunsom

MeMo: Towards Language Models with Associative Memory Mechanisms

Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle…

Computation and Language · Computer Science 2025-06-23 Fabio Massimo Zanzotto , Elena Sofia Ruzzetti , Giancarlo A. Xompero , Leonardo Ranaldi , Davide Venditti , Federico Ranaldi , Cristina Giannone , Andrea Favalli , Raniero Romagnoli

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-16 Ye Bai , Jiangyan Yi , Jianhua Tao , Zhengkun Tian , Zhengqi Wen

Deep Sequential Neural Network

Neural Networks sequentially build high-level features through their successive layers. We propose here a new neural network model where each layer is associated with a set of candidate mappings. When an input is processed, at each layer,…

Machine Learning · Computer Science 2014-10-03 Ludovic Denoyer , Patrick Gallinari

On using 2D sequence-to-sequence models for speech recognition

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney