Related papers: RLSS: A Deep Reinforcement Learning Algorithm for …

Angrier Birds: Bayesian reinforcement learning

We train a reinforcement learner to play a simplified version of the game Angry Birds. The learner is provided with a game state in a manner similar to the output that could be produced by computer vision algorithms. We improve on the…

Artificial Intelligence · Computer Science 2016-01-08 Imanol Arrieta Ibarra , Bernardo Ramos , Lars Roemheld

Steerable Scene Generation with Post Training and Inference-Time Search

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement,…

Robotics · Computer Science 2025-08-27 Nicholas Pfaff , Hongkai Dai , Sergey Zakharov , Shun Iwase , Russ Tedrake

REBEL: Reinforcement Learning via Regressing Relative Rewards

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.…

Machine Learning · Computer Science 2024-12-11 Zhaolin Gao , Jonathan D. Chang , Wenhao Zhan , Owen Oertell , Gokul Swamy , Kianté Brantley , Thorsten Joachims , J. Andrew Bagnell , Jason D. Lee , Wen Sun

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a…

Machine Learning · Computer Science 2022-01-07 Adrien Bolland , Ioannis Boukas , Mathias Berger , Damien Ernst

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

Subject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we…

Machine Learning · Computer Science 2026-04-23 Ziwei Huang , Ying Shu , Hao Fang , Quanyu Long , Wenya Wang , Qiushi Guo , Tiezheng Ge , Leilei Gan

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from…

Methodology · Statistics 2022-10-06 Mauricio Tec , Yunshan Duan , Peter Müller

Reinforcement Learning Driven Heuristic Optimization

Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to…

Machine Learning · Computer Science 2019-06-18 Qingpeng Cai , Will Hang , Azalia Mirhoseini , George Tucker , Jingtao Wang , Wei Wei

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning

Reinforcement learning (RL) -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. These successes have been facilitated by advances in…

Machine Learning · Computer Science 2025-04-03 Llewyn Salt , Marcus Gallagher

Procedural Game Level Design with Deep Reinforcement Learning

Procedural content generation (PCG) has become an increasingly popular technique in game development, allowing developers to generate dynamic, replayable, and scalable environments with reduced manual effort. In this study, a novel method…

Artificial Intelligence · Computer Science 2025-10-20 Miraç Buğra Özkan

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow…

Computation and Language · Computer Science 2023-10-23 Xiao Yu , Qingyang Wu , Kun Qian , Zhou Yu

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in…

Machine Learning · Computer Science 2024-12-03 Jinouwen Zhang , Rongkun Xue , Yazhe Niu , Yun Chen , Jing Yang , Hongsheng Li , Yu Liu

PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations

Learning from few demonstrations to develop policies robust to variations in robot initial positions and object poses is a problem of significant practical interest in robotics. Compared to imitation learning, which often struggles to…

Robotics · Computer Science 2025-04-30 Haowen Sun , Han Wang , Chengzhong Ma , Shaolong Zhang , Jiawei Ye , Xingyu Chen , Xuguang Lan

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Generating safety-critical scenarios is essential for testing and verifying the safety of autonomous vehicles. Traditional optimization techniques suffer from the curse of dimensionality and limit the search space to fixed parameter spaces.…

Machine Learning · Computer Science 2024-03-08 Haolan Liu , Liangjun Zhang , Siva Kumar Sastry Hari , Jishen Zhao

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used…

Machine Learning · Computer Science 2023-07-21 Zhiao Huang , Litian Liang , Zhan Ling , Xuanlin Li , Chuang Gan , Hao Su

A Practical Introduction to Deep Reinforcement Learning

Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and…

Machine Learning · Computer Science 2025-05-14 Yinghan Sun , Hongxi Wang , Hua Chen , Wei Zhang

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization

Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs more closely with nuanced human preferences. In this paper, we investigate the application of Group…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Matteo Gallici , Haitz Sáez de Ocáriz Borde

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have shown promise in accurately capturing users' evolving interests from their past interactions. Recent research has also explored the integration of reinforcement learning (RL) into these…

Machine Learning · Computer Science 2024-04-19 Melissa Mozifian , Tristan Sylvain , Dave Evans , Lili Meng

Gradient Shaping for Multi-Constraint Safe Reinforcement Learning

Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. In this paper, our focus lies in addressing the complex challenges…

Machine Learning · Computer Science 2023-12-27 Yihang Yao , Zuxin Liu , Zhepeng Cen , Peide Huang , Tingnan Zhang , Wenhao Yu , Ding Zhao