Related papers: Nested-Wasserstein Self-Imitation Learning for Seq…

Sequence Generation with Guider Network

Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only…

Computation and Language · Computer Science 2018-11-05 Ruiyi Zhang , Changyou Chen , Zhe Gan , Wenlin Wang , Liqun Chen , Dinghan Shen , Guoyin Wang , Lawrence Carin

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e.g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.…

Computation and Language · Computer Science 2023-08-07 Chenglong Wang , Hang Zhou , Yimin Hu , Yifu Huo , Bei Li , Tongran Liu , Tong Xiao , Jingbo Zhu

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the…

Computation and Language · Computer Science 2018-03-01 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Auxiliary Reward Generation with Transition Distance Representation Learning

Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems. The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree. In real-world…

Machine Learning · Computer Science 2024-02-13 Siyuan Li , Shijie Han , Yingnan Zhao , By Liang , Peng Liu

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that…

Machine Learning · Computer Science 2021-03-19 Ted Moskovitz , Michael Arbel , Ferenc Huszar , Arthur Gretton

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional…

Machine Learning · Computer Science 2025-07-16 Motoki Omura , Yusuke Mukuta , Kazuki Ota , Takayuki Osa , Tatsuya Harada

Distributional Reinforcement Learning with Regularized Wasserstein Loss

The empirical success of distributional reinforcement learning (RL) highly relies on the choice of distribution divergence equipped with an appropriate distribution representation. In this paper, we propose \textit{Sinkhorn distributional…

Machine Learning · Computer Science 2024-10-16 Ke Sun , Yingnan Zhao , Wulong Liu , Bei Jiang , Linglong Kong

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from…

Methodology · Statistics 2022-10-06 Mauricio Tec , Yunshan Duan , Peter Müller

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Recent advancements in reinforcement learning (RL) have achieved great success in fine-tuning diffusion-based generative models. However, fine-tuning continuous flow-based generative models to align with arbitrary user-defined reward…

Machine Learning · Computer Science 2025-02-11 Jiajun Fan , Shuaike Shen , Chaoran Cheng , Yuxin Chen , Chumeng Liang , Ge Liu

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have shown promise in accurately capturing users' evolving interests from their past interactions. Recent research has also explored the integration of reinforcement learning (RL) into these…

Machine Learning · Computer Science 2024-04-19 Melissa Mozifian , Tristan Sylvain , Dave Evans , Lili Meng

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This…

Computation and Language · Computer Science 2019-01-21 Liqun Chen , Yizhe Zhang , Ruiyi Zhang , Chenyang Tao , Zhe Gan , Haichao Zhang , Bai Li , Dinghan Shen , Changyou Chen , Lawrence Carin

Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial…

Computation and Language · Computer Science 2022-10-25 Han Guo , Bowen Tan , Zhengzhong Liu , Eric P. Xing , Zhiting Hu

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

Recent advances have demonstrated the effectiveness of Reinforcement Learning (RL) in improving the reasoning capabilities of Large Language Models (LLMs). However, existing works inevitably rely on high-quality instructions and verifiable…

Computation and Language · Computer Science 2026-01-27 Wenkai Fang , Shunyu Liu , Yang Zhou , Kongcheng Zhang , Tongya Zheng , Kaixuan Chen , Mingli Song , Dacheng Tao

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

Reinforcement learning (RL) has been effective for post-training autoregressive (AR) language models, but extending these methods to diffusion language models (DLMs) is challenging due to intractable sequence-level likelihoods. Existing…

Machine Learning · Computer Science 2026-05-15 Vishnu Teja Kunde , Fatemeh Doudi , Mahdi Farahbakhsh , Dileep Kalathil , Krishna Narayanan , Jean-Francois Chamberland

Adversarial Intrinsic Motivation for Reinforcement Learning

Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1…

Machine Learning · Computer Science 2021-10-29 Ishan Durugkar , Mauricio Tec , Scott Niekum , Peter Stone

Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation

Reinforcement learning (RL) is an effective approach to learn an optimal dialog policy for task-oriented visual dialog systems. A common practice is to apply RL on a neural sequence-to-sequence (seq2seq) framework with the action space…

Computation and Language · Computer Science 2019-10-30 Mingyang Zhou , Josh Arnold , Zhou Yu

Gaussian Prior Reinforcement Learning for Nested Named Entity Recognition

Named Entity Recognition (NER) is a well and widely studied task in natural language processing. Recently, the nested NER has attracted more attention since its practicality and difficulty. Existing works for nested NER ignore the…

Computation and Language · Computer Science 2023-05-15 Yawen Yang , Xuming Hu , Fukun Ma , Shu'ang Li , Aiwei Liu , Lijie Wen , Philip S. Yu

Imitating Language via Scalable Inverse Reinforcement Learning

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability…

Machine Learning · Computer Science 2024-12-10 Markus Wulfmeier , Michael Bloesch , Nino Vieillard , Arun Ahuja , Jorg Bornschein , Sandy Huang , Artem Sokolov , Matt Barnes , Guillaume Desjardins , Alex Bewley , Sarah Maria Elisabeth Bechtle , Jost Tobias Springenberg , Nikola Momchev , Olivier Bachem , Matthieu Geist , Martin Riedmiller

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the…

Machine Learning · Computer Science 2024-03-19 Eloi Tanguy