Related papers: CAQL: Continuous Action Q-Learning

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for…

Systems and Control · Computer Science 2014-10-14 Biao Luo , Derong Liu , Tingwen Huang

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

Meta-Q-Learning

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if…

Machine Learning · Computer Science 2020-04-07 Rasool Fakoor , Pratik Chaudhari , Stefano Soatto , Alexander J. Smola

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization…

Machine Learning · Computer Science 2020-01-23 Tom Van de Wiele , David Warde-Farley , Andriy Mnih , Volodymyr Mnih

Q-Learning Lagrange Policies for Multi-Action Restless Bandits

Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are…

Machine Learning · Computer Science 2021-06-24 Jackson A. Killian , Arpita Biswas , Sanket Shah , Milind Tambe

Model-Augmented Q-learning

In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy…

Machine Learning · Computer Science 2021-02-09 Youngmin Oh , Jinwoo Shin , Eunho Yang , Sung Ju Hwang

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression

Offline reinforcement learning (RL) enables policy learning from fixed datasets without further environment interaction, making it particularly valuable in high-risk or costly domains. Extreme $Q$-Learning (XQL) is a recent offline RL…

Machine Learning · Computer Science 2026-04-15 Xinming Gao , Shangzhe Li , Yujin Cai , Wenwu Yu

Automaton Constrained Q-Learning

Real-world robotic tasks often require agents to achieve sequences of goals while respecting time-varying safety constraints. However, standard Reinforcement Learning (RL) paradigms are fundamentally limited in these settings. A natural…

Robotics · Computer Science 2025-12-02 Anastasios Manganaris , Vittorio Giammarino , Ahmed H. Qureshi

Coarse Q-learning: Indifference, Indeterminacy, and Instability

We introduce Coarse Q-learning (CQL), a reinforcement-learning model for bandit problems with stochastically varying menus. Alternatives are exogenously partitioned into similarity classes, and feedback from sampled alternatives is pooled…

Theoretical Economics · Economics 2026-05-13 Philippe Jehiel , Aviman Satpathy

Q-learning with Adjoint Matching

We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching…

Machine Learning · Computer Science 2026-05-20 Qiyang Li , Sergey Levine

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that…

Artificial Intelligence · Computer Science 2017-11-27 Ofir Nachum , Mohammad Norouzi , Kelvin Xu , Dale Schuurmans

Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multi-agent setting can be very difficult as…

Machine Learning · Computer Science 2022-05-31 Rafael Pina , Varuna De Silva , Joosep Hook , Ahmet Kondoz

Solving optimal stopping problems with Deep Q-Learning

We propose a reinforcement learning (RL) approach to model optimal exercise strategies for option-type products. We pursue the RL avenue in order to learn the optimal action-value function of the underlying stopping problem. In addition to…

Pricing of Securities · Quantitative Finance 2024-06-27 John Ery , Loris Michel

Continuous Control with Coarse-to-fine Reinforcement Learning

Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present…

Robotics · Computer Science 2024-07-11 Younggyo Seo , Jafar Uruç , Stephen James

Deep Reinforcement Learning with Weighted Q-Learning

Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be…

Machine Learning · Computer Science 2022-06-14 Andrea Cini , Carlo D'Eramo , Jan Peters , Cesare Alippi

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Multi-agent reinforcement learning (MARL) has become a significant research topic due to its ability to facilitate learning in complex environments. In multi-agent tasks, the state-action value, commonly referred to as the Q-value, can vary…

Artificial Intelligence · Computer Science 2024-06-13 Zhenglong Luo , Zhiyong Chen , James Welsh

Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks

In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically,…

Systems and Control · Electrical Eng. & Systems 2023-11-30 Xianlun Peng , Yang Tang , Fangfei Li , Yang Liu

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. We propose BiCQL-ML, a policy-free offline IRL…

Machine Learning · Computer Science 2025-12-01 Junsung Park

Large-Scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room…

Machine Learning · Computer Science 2021-09-14 Xiaoqiang Wang , Liangjun Ke , Zhimin Qiao , Xinghua Chai

Augmented Q Imitation Learning (AQIL)

The study of unsupervised learning can be generally divided into two categories: imitation learning and reinforcement learning. In imitation learning the machine learns by mimicking the behavior of an expert system whereas in reinforcement…

Machine Learning · Computer Science 2020-04-07 Xiao Lei Zhang , Anish Agarwal