English
Related papers

Related papers: Quantile-Based Policy Optimization for Reinforceme…

200 papers

Classical reinforcement learning (RL) aims to optimize the expected cumulative reward. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative reward. We parameterize the policy controlling…

Machine Learning · Computer Science 2023-05-15 Jinyang Jiang , Jiaqiao Hu , Yijie Peng

Aligning large language models with pointwise absolute rewards has so far required online, on-policy algorithms such as PPO and GRPO. In contrast, simpler methods that can leverage offline or off-policy data, such as DPO and REBEL, are…

Machine Learning · Computer Science 2025-12-02 Simon Matrenok , Skander Moalla , Caglar Gulcehre

Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected…

Machine Learning · Computer Science 2022-11-29 Whiyoung Jung , Myungsik Cho , Jongeui Park , Youngchul Sung

Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in…

Quantum policy evaluation (QPE) is a reinforcement learning (RL) algorithm which is quadratically more efficient than an analogous classical Monte Carlo estimation. It makes use of a direct quantum mechanical realization of a finite Markov…

Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability…

Machine Learning · Computer Science 2025-06-05 Yaru Hao , Li Dong , Xun Wu , Shaohan Huang , Zewen Chi , Furu Wei

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

We propose a novel Reinforcement Learning (RL) method for optimizing quantum circuits using graph-theoretic simplification rules of ZX-diagrams. The agent, trained using the Proximal Policy Optimization (PPO) algorithm, employs Graph Neural…

Quantum Physics · Physics 2025-06-04 Jordi Riu , Jan Nogué , Gerard Vilaplana , Artur Garcia-Saez , Marta P. Estarellas

Quantum machine learning (QML), which combines quantum computing with machine learning, is widely believed to hold the potential to outperform traditional machine learning in the era of noisy intermediate-scale quantum (NISQ). As one of the…

Quantum Physics · Physics 2025-01-14 Yu-Xin Jin , Zi-Wei Wang , Hong-Ze Xu , Wei-Feng Zhuang , Meng-Jun Hu , Dong E. Liu

We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum…

Machine Learning · Computer Science 2025-06-10 Kalyan Cherukuri , Aarav Lala , Yash Yardi

Proximal Policy Optimisation (PPO) is an established and effective policy gradient algorithm used for Language Model Reinforcement Learning from Human Feedback (LM-RLHF). PPO performs well empirically but has a heuristic motivation and…

Computation and Language · Computer Science 2025-08-26 Jason R Brown , Lennie Wells , Edward James Young , Sergio Bacallado

Reinforcement Learning (RL) has emerged as a powerful tool for neural combinatorial optimization, enabling models to learn heuristics that solve complex problems without requiring expert knowledge. Despite significant progress, existing RL…

Machine Learning · Computer Science 2025-05-14 Mingjun Pan , Guanquan Lin , You-Wei Luo , Bin Zhu , Zhien Dai , Lijun Sun , Chun Yuan

Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an…

Quantum Physics · Physics 2021-12-21 Daochen Wang , Aarthi Sundaram , Robin Kothari , Ashish Kapoor , Martin Roetteler

Quantum computing exploits basic quantum phenomena such as state superposition and entanglement to perform computations. The Quantum Approximate Optimization Algorithm (QAOA) is arguably one of the leading quantum algorithms that can…

Machine Learning · Computer Science 2022-06-16 Sami Khairy , Ruslan Shaydulin , Lukasz Cincio , Yuri Alexeev , Prasanna Balaprakash

Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are…

Machine Learning · Computer Science 2023-02-23 Zifeng Zhuang , Kun Lei , Jinxin Liu , Donglin Wang , Yilang Guo

Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn…

Machine Learning · Computer Science 2026-01-27 Junbo Li , Peng Zhou , Rui Meng , Meet P. Vadera , Lihong Li , Yang Li

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize…

Machine Learning · Computer Science 2022-03-24 Ted Moskovitz , Michael Arbel , Jack Parker-Holder , Aldo Pacchiano

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying…

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel
‹ Prev 1 2 3 10 Next ›