Related papers: Flow Q-Learning

Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be…

Machine Learning · Computer Science 2026-03-17 Mumuksh Tayal , Manan Tayal , Ravi Prakash

One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning

Diffusion Q-Learning (DQL) has established diffusion policies as a high-performing paradigm for offline reinforcement learning, but its reliance on multi-step denoising for action generation renders both training and inference slow and…

Machine Learning · Computer Science 2026-02-25 Thanh Nguyen , Chang D. Yoo

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function…

Machine Learning · Computer Science 2023-08-29 Zhendong Wang , Jonathan J Hunt , Mingyuan Zhou

Causal Flow Q-Learning for Robust Offline Reinforcement Learning

Expressive policies based on flow-matching have been successfully applied in reinforcement learning (RL) more recently due to their ability to model complex action distributions from offline data. These algorithms build on standard policy…

Machine Learning · Computer Science 2026-02-04 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Flow-Based Policy for Online Reinforcement Learning

We present \textbf{FlowRL}, a novel framework for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. We argue that in addition to training signals, enhancing the…

Machine Learning · Computer Science 2025-06-17 Lei Lv , Yunfei Li , Yu Luo , Fuchun Sun , Tao Kong , Jiafeng Xu , Xiao Ma

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains…

Machine Learning · Computer Science 2026-05-14 JaeHyeok Doo , Byeongguk Jeon , Seonghyeon Ye , Kimin Lee , Minjoon Seo

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to…

Machine Learning · Computer Science 2021-10-13 Ilya Kostrikov , Ashvin Nair , Sergey Levine

Aligning Flow Map Policies with Optimal Q-Guidance

Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant…

Machine Learning · Computer Science 2026-05-13 Christos Ziakas , Alessandra Russo , Avishek Joey Bose

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

We propose Flow-Anchored Noise-conditioned Q-Learning (FAN), a highly efficient and high-performing offline reinforcement learning (RL) algorithm. Recent work has shown that expressive flow policies and distributional critics improve…

Machine Learning · Computer Science 2026-05-29 Sungyoung Lee , Dohyeong Kim , Eshan Balachandar , Zelal Su Mustafaoglu , Keshav Pingali

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected,…

Machine Learning · Computer Science 2020-08-20 Aviral Kumar , Aurick Zhou , George Tucker , Sergey Levine

One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow

We introduce a one-step generative policy for offline reinforcement learning that maps noise directly to actions via a residual reformulation of MeanFlow, making it compatible with Q-learning. While one-step Gaussian policies enable fast…

Machine Learning · Computer Science 2025-11-18 Zeyuan Wang , Da Li , Yulin Chen , Ye Shi , Liang Bai , Tianyuan Yu , Yanwei Fu

Flow Matching for Offline Reinforcement Learning with Discrete Actions

Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continuous action spaces. To address a broader range of…

Machine Learning · Computer Science 2026-05-14 Fairoz Nower Khan , Nabuat Zaman Nahim , Ruiquan Huang , Haibo Yang , Peizhong Ju

Boosting Continuous Control with Consistency Policy

Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of…

Machine Learning · Computer Science 2024-01-25 Yuhui Chen , Haoran Li , Dongbin Zhao

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching…

Machine Learning · Computer Science 2023-09-14 Siddarth Venkatraman , Shivesh Khaitan , Ravi Tej Akella , John Dolan , Jeff Schneider , Glen Berseth

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of…

Machine Learning · Computer Science 2024-11-04 Tianyu Chen , Zhendong Wang , Mingyuan Zhou

Evolving Diffusion and Flow Matching Policies for Online Reinforcement Learning

Diffusion and flow matching policies offer expressive, multimodal action modeling, yet they are frequently unstable in online reinforcement learning (RL) due to intractable likelihoods and gradients propagating through long sampling chains.…

Machine Learning · Computer Science 2026-03-10 Chubin Zhang , Zhenglin Wan , Feng Chen , Fuchao Yang , Lang Feng , Yaxin Zhou , Xingrui Yu , Yang You , Ivor Tsang , Bo An

Efficient Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by…

Machine Learning · Computer Science 2023-10-27 Bingyi Kang , Xiao Ma , Chao Du , Tianyu Pang , Shuicheng Yan

Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

Offline goal-conditioned reinforcement learning (GCRL) is a practical reinforcement learning paradigm that aims to learn goal-conditioned policies from reward-free offline data. Despite recent advances in hierarchical architectures such as…

Machine Learning · Computer Science 2026-04-13 Zhiqiang Dong , Teng Pang , Rongjian Xu , Guoqiang Wu

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression

Offline reinforcement learning (RL) enables policy learning from fixed datasets without further environment interaction, making it particularly valuable in high-risk or costly domains. Extreme $Q$-Learning (XQL) is a recent offline RL…

Machine Learning · Computer Science 2026-04-15 Xinming Gao , Shangzhe Li , Yujin Cai , Wenwu Yu

Expressive Value Learning for Scalable Offline Reinforcement Learning

Reinforcement learning (RL) is a powerful paradigm for learning to make sequences of decisions. However, RL has yet to be fully leveraged in robotics, principally due to its lack of scalability. Offline RL offers a promising avenue by…

Machine Learning · Computer Science 2025-10-10 Nicolas Espinosa-Dice , Kiante Brantley , Wen Sun