Related papers: Data-Efficient Quadratic Q-Learning Using LMIs

Data-Driven LQR using Reinforcement Learning and Quadratic Neural Networks

This paper introduces a novel data-driven approach to design a linear quadratic regulator (LQR) using a reinforcement learning (RL) algorithm that does not require a system model. The key contribution is to perform policy iteration (PI) by…

Systems and Control · Electrical Eng. & Systems 2023-11-20 Soroush Asri , Luis Rodrigues

Supplemental Material For "Primal-Dual Q-Learning Framework for LQR Design"

Recently, reinforcement learning (RL) is receiving more and more attentions due to its successful demonstrations outperforming human performance in certain challenging tasks. In our recent paper `primal-dual Q-learning framework for LQR…

Optimization and Control · Mathematics 2018-11-22 Donghwan Lee , Jianghai Hu

Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations

This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral…

Optimization and Control · Mathematics 2026-05-01 Armin Gießler , Felix Thömmes , Sören Hohmann

Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems

This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages…

Systems and Control · Electrical Eng. & Systems 2023-04-03 Victor G. Lopez , Mohammad Alsalti , Matthias A. Müller

Accelerating Quadratic Optimization with Reinforcement Learning

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges:…

Machine Learning · Computer Science 2021-07-23 Jeffrey Ichnowski , Paras Jain , Bartolomeo Stellato , Goran Banjac , Michael Luo , Francesco Borrelli , Joseph E. Gonzalez , Ion Stoica , Ken Goldberg

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

ShiQ: Bringing back Bellman to LLMs

The fine-tuning of pre-trained large language models (LLMs) using reinforcement learning (RL) is generally formulated as direct policy optimization. This approach was naturally favored as it efficiently improves a pretrained LLM, seen as an…

Machine Learning · Computer Science 2025-05-19 Pierre Clavier , Nathan Grinsztajn , Raphael Avalos , Yannis Flet-Berliac , Irem Ergun , Omar D. Domingues , Eugene Tarassov , Olivier Pietquin , Pierre H. Richemond , Florian Strub , Matthieu Geist

Linear-Quadratic Problems in Systems and Controls via Covariance Representations and Linear-Conic Duality: Finite-Horizon Case

Linear-Quadratic (LQ) problems that arise in systems and controls include the classical optimal control problems of the Linear Quadratic Regulator (LQR) in both its deterministic and stochastic forms, as well as $H^\infty$-analysis (the…

Systems and Control · Electrical Eng. & Systems 2024-01-04 Bassam Bamieh

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

Sufficient Exploration for Convex Q-learning

In recent years there has been a collective research effort to find new formulations of reinforcement learning that are simultaneously more efficient and more amenable to analysis. This paper concerns one approach that builds on the linear…

Optimization and Control · Mathematics 2022-10-19 Fan Lu , Prashant Mehta , Sean Meyn , Gergely Neu

Meta-Q-Learning

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if…

Machine Learning · Computer Science 2020-04-07 Rasool Fakoor , Pratik Chaudhari , Stefano Soatto , Alexander J. Smola

Stochastic Linear Quadratic Optimal Control Problem: A Reinforcement Learning Method

This paper applies a reinforcement learning (RL) method to solve infinite horizon continuous-time stochastic linear quadratic problems, where drift and diffusion terms in the dynamics may depend on both the state and control. Based on…

Optimization and Control · Mathematics 2021-09-17 Na Li , Xun Li , Jing Peng , Zuo Quan Xu

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors,…

Machine Learning · Computer Science 2022-10-19 Kishan Panaganti , Zaiyan Xu , Dileep Kalathil , Mohammad Ghavamzadeh

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for…

Systems and Control · Computer Science 2014-10-14 Biao Luo , Derong Liu , Tingwen Huang

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an…

Machine Learning · Computer Science 2026-03-05 Shengbo Wang

Stochastic Primal-Dual Q-Learning

In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the…

Optimization and Control · Mathematics 2025-07-21 Narim Jeong , Donghwan Lee , Niao He

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The method is closely related to the classic Relative Entropy Policy Search (REPS) algorithm of Peters…

Machine Learning · Computer Science 2021-03-01 Joan Bas-Serrano , Sebastian Curi , Andreas Krause , Gergely Neu

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type…

Machine Learning · Computer Science 2026-05-11 Gugan Thoppe , L. A. Prashanth , Ankur Naskar , Sanjay Bhat

Optimal Observer Design Using Reinforcement Learning and Quadratic Neural Networks

This paper introduces an innovative approach based on policy iteration (PI), a reinforcement learning (RL) algorithm, to obtain an optimal observer with a quadratic cost function. This observer is designed for systems with a given…

Systems and Control · Electrical Eng. & Systems 2023-11-29 Soroush Asri , Luis Rodrigues

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No…

Systems and Control · Electrical Eng. & Systems 2024-08-21 Mohammad Alsalti , Victor G. Lopez , Matthias A. Müller