Related papers: Controlled Decoding from Language Models

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous…

Machine Learning · Computer Science 2019-02-22 Justin Fu , Anoop Korattikara , Sergey Levine , Sergio Guadarrama

On the Low-Rank Parametrization of Reward Models for Controlled Language Generation

Language models trained on large amounts of data are known to produce inappropriate content in some cases and require careful tuning to be used in the real world. We revisit an effective and modular approach for controllability of the…

Computation and Language · Computer Science 2025-09-23 Sergey Troshin , Vlad Niculae , Antske Fokkens

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally…

Computation and Language · Computer Science 2024-05-01 Mathieu Rita , Florian Strub , Rahma Chaabouni , Paul Michel , Emmanuel Dupoux , Olivier Pietquin

Conceptual Reinforcement Learning for Language-Conditioned Tasks

Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate…

Machine Learning · Computer Science 2023-03-10 Shaohui Peng , Xing Hu , Rui Zhang , Jiaming Guo , Qi Yi , Ruizhi Chen , Zidong Du , Ling Li , Qi Guo , Yunji Chen

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to…

Machine Learning · Computer Science 2025-02-25 Yulai Zhao , Masatoshi Uehara , Gabriele Scalia , Sunyuan Kung , Tommaso Biancalani , Sergey Levine , Ehsan Hajiramezanali

Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning

Controlling output length in neural language generation is valuable in many scenarios, especially for the tasks that have length constraints. A model with stronger length control capacity can produce sentences with more specific length,…

Computation and Language · Computer Science 2019-09-23 Junyi Bian , Baojun Lin , Ke Zhang , Zhaohui Yan , Hong Tang , Yonghe Zhang

Consultant Decoding: Yet Another Synergistic Mechanism

The synergistic mechanism based on Speculative Decoding (SD) has garnered considerable attention as a simple yet effective approach for accelerating the inference of large language models (LLMs). Nonetheless, the high rejection rates…

Computation and Language · Computer Science 2025-06-04 Chuanghao Ding , Jiaping Wang , Ziqing Yang , Xiaoliang Wang , Dahua Lin , Cam-Tu Nguyen , Fei Tan

Critic-Guided Decoding for Controlled Text Generation

Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective…

Computation and Language · Computer Science 2022-12-22 Minbeom Kim , Hwanhee Lee , Kang Min Yoo , Joonsuk Park , Hwaran Lee , Kyomin Jung

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text…

Computation and Language · Computer Science 2024-01-03 Haikang Deng , Colin Raffel

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment…

Machine Learning · Computer Science 2025-12-09 Ming Chen , Sheng Tang , Rong-Xi Tan , Ziniu Li , Jiacheng Chen , Ke Xue , Chao Qian

Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control

Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal-conditioned setting based on optimal control. In…

Machine Learning · Computer Science 2026-05-15 Nathan P. Lawrence , Ali Mesbah

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications…

Machine Learning · Computer Science 2026-05-18 Mehul Damani , Isha Puri , Stewart Slocum , Idan Shenfeld , Leshem Choshen , Yoon Kim , Jacob Andreas

A Controlled Reevaluation of Coreference Resolution Models

All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained language model. Whether the superior performance of one CR model over another is due to the choice of language model or other factors, such as the…

Computation and Language · Computer Science 2024-04-24 Ian Porada , Xiyuan Zou , Jackie Chi Kit Cheung

Reinforcement Learning with Conditional Expectation Reward

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing the reasoning capabilities of large language models, particularly in domains such as mathematics where reliable rule-based verifiers can be constructed.…

Machine Learning · Computer Science 2026-03-12 Changyi Xiao , Caijun Xu , Yixin Cao

Reinforcement Learning with Temporal-Logic-Based Causal Diagrams

We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals. In this setting, a common approach is to represent the tasks as deterministic finite automata (DFA) and…

Artificial Intelligence · Computer Science 2023-06-27 Yash Paliwal , Rajarshi Roy , Jean-Raphaël Gaglione , Nasim Baharisangari , Daniel Neider , Xiaoming Duan , Ufuk Topcu , Zhe Xu

Efficient Reinforcement Learning for Unsupervised Controlled Text Generation

Controlled text generation tasks such as unsupervised text style transfer have increasingly adopted the use of Reinforcement Learning (RL). A major challenge in applying RL to such tasks is the sparse reward, which is available only after…

Computation and Language · Computer Science 2022-04-19 Bhargav Upadhyay , Akhilesh Sudhakar , Arjun Maheswaran

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

It is commonly believed that optimizing the reverse KL divergence results in "mode seeking", while optimizing forward KL results in "mass covering", with the latter being preferred if the goal is to sample from multiple diverse modes. We…

Machine Learning · Computer Science 2025-10-24 Anthony GX-Chen , Jatin Prakash , Jeff Guo , Rob Fergus , Rajesh Ranganath

DeAL: Decoding-time Alignment for Large Language Models

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF).…

Artificial Intelligence · Computer Science 2026-01-21 James Y. Huang , Sailik Sengupta , Daniele Bonadiman , Yi-An Lai , Arshit Gupta , Nikolaos Pappas , Saab Mansour , Katrin Kirchhoff , Dan Roth

Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems

To solve Math Word Problems, human students leverage diverse reasoning logic that reaches different possible equation solutions. However, the mainstream sequence-to-sequence approach of automatic solvers aims to decode a fixed solution…

Computation and Language · Computer Science 2022-12-01 Yibin Shen , Qianying Liu , Zhuoyuan Mao , Zhen Wan , Fei Cheng , Sadao Kurohashi

Reinforcement Learning Agent Training with Goals for Real World Tasks

Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks. However, designing reward functions for complex tasks (e.g., with multiple objectives and safety…

Artificial Intelligence · Computer Science 2021-07-23 Xuan Zhao , Marcos Campos