Related papers: An Actor-Critic Algorithm for Sequence Prediction

Efficient Sequence Labeling with Actor-Critic Training

Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks. We set out to establish RNNs as an…

Machine Learning · Computer Science 2018-10-02 Saeed Najafi , Colin Cherry , Grzegorz Kondrak

Actor-Critic based Training Framework for Abstractive Summarization

We present a training framework for neural abstractive summarization based on actor-critic approaches from reinforcement learning. In the traditional neural network based methods, the objective is only to maximize the likelihood of the…

Computation and Language · Computer Science 2018-08-16 Piji Li , Lidong Bing , Wai Lam

Actor-Critic Sequence Training for Image Captioning

Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing. Such image captioning methods are…

Computer Vision and Pattern Recognition · Computer Science 2017-11-29 Li Zhang , Flood Sung , Feng Liu , Tao Xiang , Shaogang Gong , Yongxin Yang , Timothy M. Hospedales

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the…

Machine Learning · Computer Science 2015-09-24 Samy Bengio , Oriol Vinyals , Navdeep Jaitly , Noam Shazeer

Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning

Training critiquing language models to assess and provide feedback on model outputs is a promising way to improve LLMs for complex reasoning tasks. However, existing approaches typically rely on stronger supervisors for annotating critique…

Computation and Language · Computer Science 2025-10-29 Zhiheng Xi , Jixuan Huang , Xin Guo , Boyang Hong , Dingwen Yang , Xiaoran Fan , Shuo Li , Zehui Chen , Junjie Ye , Siyu Yuan , Zhengyin Du , Xuesong Yao , Yufei Xu , Jiecao Chen , Rui Zheng , Tao Gui , Qi Zhang , Xuanjing Huang

Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning.…

Machine Learning · Computer Science 2017-06-30 Flood Sung , Li Zhang , Tao Xiang , Timothy Hospedales , Yongxin Yang

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained…

Machine Learning · Computer Science 2023-10-23 Philip John Gorinski , Matthieu Zimmer , Gerasimos Lampouras , Derrick Goh Xin Deik , Ignacio Iacobacci

Actor-Critic Pretraining for Proximal Policy Optimization

Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of…

Machine Learning · Computer Science 2026-03-02 Andreas Kernbach , Amr Elsheikh , Nicolas Grupp , René Nagel , Marco F. Huber

Teaching Language Models to Critique via Reinforcement Learning

Teaching large language models (LLMs) to critique and refine their outputs is crucial for building systems that can iteratively improve, yet it is fundamentally limited by the ability to provide accurate judgments and actionable…

Machine Learning · Computer Science 2025-12-02 Zhihui Xie , Jie Chen , Liyu Chen , Weichao Mao , Jingjing Xu , Lingpeng Kong

Learning to Decode for Future Success

We introduce a simple, general strategy to manipulate the behavior of a neural decoder that enables it to generate outputs that have specific properties of interest (e.g., sequences of a pre-specified length). The model can be thought of as…

Computation and Language · Computer Science 2017-02-07 Jiwei Li , Will Monroe , Dan Jurafsky

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data,…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Ruchika Chavhan , Biplab Banerjee , Xiao Xiang Zhu , Subhasis Chaudhuri

Critic-Guided Decoding for Controlled Text Generation

Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective…

Computation and Language · Computer Science 2022-12-22 Minbeom Kim , Hwanhee Lee , Kang Min Yoo , Joonsuk Park , Hwaran Lee , Kyomin Jung

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with…

Computation and Language · Computer Science 2017-07-06 Pei-Hao Su , Pawel Budzianowski , Stefan Ultes , Milica Gasic , Steve Young

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in…

Machine Learning · Computer Science 2022-02-24 Anas Barakat , Pascal Bianchi , Julien Lehmann

Re-ENACT: Reinforcement Learning for Emotional Speech Generation using Actor-Critic Strategy

In this paper, we propose the first method to modify the prosodic features of a given speech signal using actor-critic reinforcement learning strategy. Our approach uses a Bayesian framework to identify contiguous segments of importance…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-06 Ravi Shankar , Archana Venkataraman

Reinforcement Learning Based Symbolic Regression for Load Modeling

With the increasing penetration of renewable energy sources, growing demand variability, and evolving grid control strategies, accurate and efficient load modeling has become a critical yet challenging task. Traditional methods, such as…

Systems and Control · Electrical Eng. & Systems 2025-03-11 Ding Lin , Han Guo , Jianhui Wang , Meng Yue , Tianqiao Zhao

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time…

Machine Learning · Computer Science 2016-05-10 Marc'Aurelio Ranzato , Sumit Chopra , Michael Auli , Wojciech Zaremba

Connecting Generative Adversarial Networks and Actor-Critic Methods

Both generative adversarial networks (GAN) in unsupervised learning and actor-critic methods in reinforcement learning (RL) have gained a reputation for being difficult to optimize. Practitioners in both fields have amassed a large number…

Machine Learning · Computer Science 2017-01-19 David Pfau , Oriol Vinyals

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the…

Computation and Language · Computer Science 2018-03-01 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura