Related papers: Sentence-wise Smooth Regularization for Sequence t…

Beyond MLE: Convex Learning for Text Generation

Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language…

Computation and Language · Computer Science 2023-10-27 Chenze Shao , Zhengrui Ma , Min Zhang , Yang Feng

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This…

Computation and Language · Computer Science 2019-01-21 Liqun Chen , Yizhe Zhang , Ruiyi Zhang , Chenyang Tao , Zhe Gan , Haichao Zhang , Bai Li , Dinghan Shen , Changyou Chen , Lawrence Carin

Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation

Inferring the probability distribution of sentences or word sequences is a key process in natural language processing. While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences,…

Computation and Language · Computer Science 2021-03-16 Heewoong Park , Sukhyun Cho , Jonghun Park

Regularizing Output Distribution of Abstractive Chinese Social Media Text Summarization for Improved Semantic Consistency

Abstractive text summarization is a highly difficult problem, and the sequence-to-sequence model has shown success in improving the performance on the task. However, the generated summaries are often inconsistent with the source content in…

Computation and Language · Computer Science 2018-05-11 Bingzhen Wei , Xuancheng Ren , Xu Sun , Yi Zhang , Xiaoyan Cai , Qi Su

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights…

Computation and Language · Computer Science 2018-02-23 Prajit Ramachandran , Peter J. Liu , Quoc V. Le

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood Estimation suffer the exposure bias problem in practical finite sample scenarios. The crux is that the number of training samples for Maximum Likelihood Estimation is…

Machine Learning · Statistics 2020-07-14 Yuxuan Song , Ning Miao , Hao Zhou , Lantao Yu , Mingxuan Wang , Lei Li

Token-level and sequence-level loss smoothing for RNN language models

Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of…

Computation and Language · Computer Science 2018-05-15 Maha Elbayad , Laurent Besacier , Jakob Verbeek

Smoothed Analysis of Sequential Probability Assignment

We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum…

Machine Learning · Computer Science 2023-03-10 Alankrita Bhatt , Nika Haghtalab , Abhishek Shetty

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Connecting the Dots Between MLE and RL for Sequence Prediction

Sequence prediction models can be learned from example sequences with a variety of training algorithms. Maximum likelihood learning is simple and efficient, yet can suffer from compounding error at test time. Reinforcement learning such as…

Machine Learning · Computer Science 2019-07-02 Bowen Tan , Zhiting Hu , Zichao Yang , Ruslan Salakhutdinov , Eric Xing

Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation

Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models. In previous subword regularizations, we use multiple segmentations in the training process but use…

Computation and Language · Computer Science 2022-03-28 Sho Takase , Tatsuya Hiraoka , Naoaki Okazaki

Smooth Imitation Learning for Online Sequence Prediction

We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential…

Machine Learning · Computer Science 2016-06-06 Hoang M. Le , Andrew Kang , Yisong Yue , Peter Carr

ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems

Regularization of neural machine translation is still a significant problem, especially in low-resource settings. To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is…

Computation and Language · Computer Science 2019-04-05 Inigo Jauregi Unanue , Ehsan Zare Borzeshi , Nazanin Esmaili , Massimo Piccardi

Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial…

Computation and Language · Computer Science 2022-10-25 Han Guo , Bowen Tan , Zhengzhong Liu , Eric P. Xing , Zhiting Hu

Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation

Despite some empirical success at correcting exposure bias in machine translation, scheduled sampling algorithms suffer from a major drawback: they incorrectly assume that words in the reference translations and in sampled sequences are…

Computation and Language · Computer Science 2019-05-07 Weijia Xu , Xing Niu , Marine Carpuat

Regressing Word and Sentence Embeddings for Regularization of Neural Machine Translation

In recent years, neural machine translation (NMT) has become the dominant approach in automated translation. However, like many other deep learning approaches, NMT suffers from overfitting when the amount of training data is limited. This…

Computation and Language · Computer Science 2019-10-01 Inigo Jauregi Unanue , Ehsan Zare Borzeshi , Massimo Piccardi

Consistency Regularization for Cross-Lingual Fine-Tuning

Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we…

Computation and Language · Computer Science 2021-06-16 Bo Zheng , Li Dong , Shaohan Huang , Wenhui Wang , Zewen Chi , Saksham Singhal , Wanxiang Che , Ting Liu , Xia Song , Furu Wei

Token-wise Curriculum Learning for Neural Machine Translation

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where…

Computation and Language · Computer Science 2021-03-23 Chen Liang , Haoming Jiang , Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao , Tuo Zhao

Selective Output Smoothing Regularization: Regularize Neural Networks by Softening Output Distributions

In this paper, we propose Selective Output Smoothing Regularization, a novel regularization method for training the Convolutional Neural Networks (CNNs). Inspired by the diverse effects on training from different samples, Selective Output…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xuan Cheng , Tianshu Xie , Xiaomin Wang , Qifeng Weng , Minghui Liu , Jiali Deng , Ming Liu

Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Computing next-token likelihood ratios between two language models (LMs) is a standard task in training paradigms such as knowledge distillation. Since this requires both models to share the same probability space, it becomes challenging…

Computation and Language · Computer Science 2026-05-07 Buu Phan , Ashish Khisti , Karen Ullrich