Related papers: Insertion-Deletion Transformer

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations. Unlike typical autoregressive models which rely on a fixed, often left-to-right ordering of the…

Computation and Language · Computer Science 2019-02-12 Mitchell Stern , William Chan , Jamie Kiros , Jakob Uszkoreit

Levenshtein Transformer

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially…

Computation and Language · Computer Science 2019-10-29 Jiatao Gu , Changhan Wang , Jake Zhao

Long Short-Term Sample Distillation

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Liang Jiang , Zujie Wen , Zhongping Liang , Yafang Wang , Gerard de Melo , Zhe Li , Liangzhuang Ma , Jiaxing Zhang , Xiaolong Li , Yuan Qi

Scalable Transformers for Neural Machine Translation

Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. However, the deployment of Transformer is challenging because different scenarios require…

Computation and Language · Computer Science 2021-06-21 Peng Gao , Shijie Geng , Yu Qiao , Xiaogang Wang , Jifeng Dai , Hongsheng Li

A Novel Approach to Dropped Pronoun Translation

Dropped Pronouns (DP) in which pronouns are frequently dropped in the source language but should be retained in the target language are challenge in machine translation. In response to this problem, we propose a semi-supervised approach to…

Computation and Language · Computer Science 2016-04-22 Longyue Wang , Zhaopeng Tu , Xiaojun Zhang , Hang Li , Andy Way , Qun Liu

Two-Step Sound Source Separation: Training on Learned Latent Targets

In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is…

Machine Learning · Computer Science 2021-05-12 Efthymios Tzinis , Shrikant Venkataramani , Zhepei Wang , Cem Subakan , Paris Smaragdis

Context- and Sequence-Aware Convolutional Recurrent Encoder for Neural Machine Translation

Neural Machine Translation model is a sequence-to-sequence converter based on neural networks. Existing models use recurrent neural networks to construct both the encoder and decoder modules. In alternative research, the recurrent networks…

Computation and Language · Computer Science 2021-05-04 Ritam Mallick , Seba Susan , Vaibhaw Agrawal , Rizul Garg , Prateek Rawal

A Deep Memory-based Architecture for Sequence-to-Sequence Learning

We propose DEEPMEMORY, a novel deep architecture for sequence-to-sequence learning, which performs the task through a series of nonlinear transformations from the representation of the input sequence (e.g., a Chinese sentence) to the final…

Computation and Language · Computer Science 2016-01-08 Fandong Meng , Zhengdong Lu , Zhaopeng Tu , Hang Li , Qun Liu

Layer-Parallel Training for Transformers

We present a new training methodology for transformers using a multilevel, layer-parallel approach. Through a neural ODE formulation of transformers, our application of a multilevel parallel-in-time algorithm for the forward and…

Machine Learning · Computer Science 2026-01-27 Shuai Jiang , Marc Salvadó-Benasco , Eric C. Cyr , Alena Kopaničáková , Rolf Krause , Jacob B. Schroder

InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model

We propose InsNet, an expressive insertion-based text generator with efficient training and flexible decoding (parallel or sequential). Unlike most existing insertion-based text generation works that require re-encoding of the context after…

Computation and Language · Computer Science 2022-10-18 Sidi Lu , Tao Meng , Nanyun Peng

A Neural Transducer

Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output…

Machine Learning · Computer Science 2016-08-08 Navdeep Jaitly , David Sussillo , Quoc V. Le , Oriol Vinyals , Ilya Sutskever , Samy Bengio

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current…

Machine Learning · Computer Science 2020-10-27 Minjia Zhang , Yuxiong He

A CNN-Transformer Deep Learning Model for Real-time Sleep Stage Classification in an Energy-Constrained Wireless Device

This paper proposes a deep learning (DL) model for automatic sleep stage classification based on single-channel EEG data. The DL model features a convolutional neural network (CNN) and transformers. The model was designed to run on energy…

Signal Processing · Electrical Eng. & Systems 2022-11-24 Zongyan Yao , Xilin Liu

Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. We 'invert' a trained network (teacher) to synthesize class-conditional input images starting from random…

Machine Learning · Computer Science 2020-06-17 Hongxu Yin , Pavlo Molchanov , Zhizhong Li , Jose M. Alvarez , Arun Mallya , Derek Hoiem , Niraj K. Jha , Jan Kautz

Enhanced Transformer Architecture for Natural Language Processing

Transformer is a state-of-the-art model in the field of natural language processing (NLP). Current NLP models primarily increase the number of transformers to improve processing performance. However, this technique requires a lot of…

Computation and Language · Computer Science 2023-10-18 Woohyeon Moon , Taeyoung Kim , Bumgeun Park , Dongsoo Har

An Efficient Character-Level Neural Machine Translation

Neural machine translation aims at building a single large neural network that can be trained to maximize translation performance. The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to…

Computation and Language · Computer Science 2016-08-22 Shenjian Zhao , Zhihua Zhang

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Peng Tang , Pengkai Zhu , Tian Li , Srikar Appalaraju , Vijay Mahadevan , R. Manmatha

Imputer: Sequence Modelling via Imputation and Dynamic Programming

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-23 William Chan , Chitwan Saharia , Geoffrey Hinton , Mohammad Norouzi , Navdeep Jaitly

Ultra-Long Sequence Distributed Transformer

Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-09 Xiao Wang , Isaac Lyngaas , Aristeidis Tsaris , Peng Chen , Sajal Dash , Mayanka Chandra Shekar , Tao Luo , Hong-Jun Yoon , Mohamed Wahib , John Gouley

Memory-efficient Stochastic methods for Memory-based Transformers

Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based…

Machine Learning · Computer Science 2023-11-15 Vishwajit Kumar Vishnu , C. Chandra Sekhar