Related papers: Parallel Scheduled Sampling

Scheduled Sampling for Transformers

Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous…

Computation and Language · Computer Science 2019-06-28 Tsvetomila Mihaylova , André F. T. Martins

Attention Forcing for Sequence-to-sequence Model Training

Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach,…

Machine Learning · Computer Science 2019-10-04 Qingyun Dou , Yiting Lu , Joshua Efiong , Mark J. F. Gales

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the…

Machine Learning · Computer Science 2015-09-24 Samy Bengio , Oriol Vinyals , Navdeep Jaitly , Noam Shazeer

Professor Forcing: A New Algorithm for Training Recurrent Networks

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing…

Machine Learning · Statistics 2016-10-31 Alex Lamb , Anirudh Goyal , Ying Zhang , Saizheng Zhang , Aaron Courville , Yoshua Bengio

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus…

Computation and Language · Computer Science 2021-09-01 Yijin Liu , Fandong Meng , Yufeng Chen , Jinan Xu , Jie Zhou

Automatically Planning Optimal Parallel Strategy for Large Language Models

The number of parameters in large-scale language models based on transformers is gradually increasing, and the scale of computing clusters is also growing. The technology of quickly mobilizing large amounts of computing resources for…

Artificial Intelligence · Computer Science 2025-01-03 Zongbiao Li , Xiezhao Li , Yinghao Cui , Yijun Chen , Zhixuan Gu , Yuxuan Liu , Wenbo Zhu , Fei Jia , Ke Liu , Qifeng Li , Junyao Zhan , Jiangtao Zhou , Chenxi Zhang , Qike Liu

Flipped Classroom: Effective Teaching for Time Series Forecasting

Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data reaching state-of-the-art performance. Training such models can be delicate though. The two most common training strategies within…

Machine Learning · Computer Science 2022-10-18 Philipp Teutsch , Patrick Mäder

Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

Developing an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses. Moreover, many tests require multiple distinct sets of questions…

Computation and Language · Computer Science 2023-10-11 Eric Zelikman , Wanjing Anya Ma , Jasmine E. Tran , Diyi Yang , Jason D. Yeatman , Nick Haber

Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation

State-of-the-art neural text generation models are typically trained to maximize the likelihood of each token in the ground-truth sequence conditioned on the previous target tokens. However, during inference, the model needs to make a…

Computation and Language · Computer Science 2023-02-01 Xiang Lin , Prathyusha Jwalapuram , Shafiq Joty

On Compressing Sequences for Self-Supervised Speech Models

Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in…

Computation and Language · Computer Science 2022-10-26 Yen Meng , Hsuan-Jui Chen , Jiatong Shi , Shinji Watanabe , Paola Garcia , Hung-yi Lee , Hao Tang

How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these…

Machine Learning · Statistics 2015-11-17 Ferenc Huszár

Faster Training of Diffusion Models and Improved Density Estimation via Parallel Score Matching

In Diffusion Probabilistic Models (DPMs), the task of modeling the score evolution via a single time-dependent neural network necessitates extended training periods and may potentially impede modeling flexibility and capacity. To counteract…

Machine Learning · Computer Science 2023-06-06 Etrit Haxholli , Marco Lorenzi

Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios

In order to reveal the rationale behind model predictions, many works have exploited providing explanations in various forms. Recently, to further guarantee readability, more and more works turn to generate sentence-level human language…

Computation and Language · Computer Science 2023-02-22 Yan Liu , Xiaokang Chen , Qi Dai

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them. A number of approaches have been proposed to produce synthetic…

Computation and Language · Computer Science 2024-01-30 Víctor M. Sánchez-Cartagena , Miquel Esplà-Gomis , Juan Antonio Pérez-Ortiz , Felipe Sánchez-Martínez

Revisiting Self-Training for Neural Sequence Generation

Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). While self-training…

Machine Learning · Computer Science 2020-10-20 Junxian He , Jiatao Gu , Jiajun Shen , Marc'Aurelio Ranzato

ParallelSpec: Parallel Drafter for Efficient Speculative Decoding

Speculative decoding has proven to be an efficient solution to large language model (LLM) inference, where the small drafter predicts future tokens at a low cost, and the target model is leveraged to verify them in parallel. However, most…

Computation and Language · Computer Science 2024-10-10 Zilin Xiao , Hongming Zhang , Tao Ge , Siru Ouyang , Vicente Ordonez , Dong Yu

A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised Pretraining

Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments,…

Machine Learning · Computer Science 2024-08-16 Audrey Der , Chin-Chia Michael Yeh , Xin Dai , Huiyuan Chen , Yan Zheng , Yujie Fan , Zhongfang Zhuang , Vivian Lai , Junpeng Wang , Liang Wang , Wei Zhang , Eamonn Keogh

Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

This paper considers a class of reinforcement learning problems, which involve systems with two types of states: stochastic and pseudo-stochastic. In such systems, stochastic states follow a stochastic transition kernel while the…

Machine Learning · Computer Science 2023-11-09 Honghao Wei , Xin Liu , Weina Wang , Lei Ying

Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Recent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as…

Computation and Language · Computer Science 2021-02-09 Ernie Chang , Hui-Syuan Yeh , Vera Demberg

Parallel Attention Forcing for Machine Translation

Attention-based autoregressive models have achieved state-of-the-art performance in various sequence-to-sequence tasks, including Text-To-Speech (TTS) and Neural Machine Translation (NMT), but can be difficult to train. The standard…

Computation and Language · Computer Science 2022-11-08 Qingyun Dou , Mark Gales