Related papers: Value Function Decomposition in Markov Recommendat…

Future Impact Decomposition in Request-level Recommendations

In recommender systems, reinforcement learning solutions have shown promising results in optimizing the interaction sequence between users and the system over the long-term performance. For practical reasons, the policy's actions are…

Information Retrieval · Computer Science 2024-06-19 Xiaobei Wang , Shuchang Liu , Xueliang Wang , Qingpeng Cai , Lantao Hu , Han Li , Peng Jiang , Kun Gai , Guangming Xie

Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates. Value estimation heavily depends on the stochasticity of environmental dynamics and the quality of reward…

Machine Learning · Computer Science 2019-05-28 Hongyao Tang , Jianye Hao , Guangyong Chen , Pengfei Chen , Zhaopeng Meng , Yaodong Yang , Li Wang

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the…

Machine Learning · Computer Science 2022-10-24 James MacGlashan , Evan Archer , Alisa Devlic , Takuma Seno , Craig Sherstan , Peter R. Wurman , Peter Stone

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information…

Machine Learning · Computer Science 2021-04-20 Milos S. Stankovic , Marko Beko , Srdjan S. Stankovic

Sequential Recommendation with User Evolving Preference Decomposition

Modeling user sequential behaviors has recently attracted increasing attention in the recommendation domain. Existing methods mostly assume coherent preference in the same sequence. However, user personalities are volatile and easily…

Information Retrieval · Computer Science 2022-04-01 Weiqi Shao , Xu Chen , Long Xia , Jiashu Zhao , Dawei Yin

Multi-agent Markov Entanglement

Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of…

Machine Learning · Computer Science 2025-11-14 Shuze Chen , Tianyi Peng

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be…

Machine Learning · Computer Science 2021-03-04 Hongyao Tang , Jianye Hao , Guangyong Chen , Pengfei Chen , Chen Chen , Yaodong Yang , Luo Zhang , Wulong Liu , Zhaopeng Meng

Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting

Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of the discount factor is often overlooked, as it is…

Machine Learning · Computer Science 2026-02-05 Mehrdad Moghimi , Anthony Coache , Hyejin Ku

Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex

This study addresses the challenges of dynamics and complexity in intelligent human-computer interaction and proposes a reinforcement learning-based optimization framework to improve long-term returns and overall experience. Human-computer…

Human-Computer Interaction · Computer Science 2025-11-03 Rui Liu , Yifan Zhuang , Runsheng Zhang

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to…

Multiagent Systems · Computer Science 2022-12-08 Zhiwei Xu , Dapeng Li , Bin Zhang , Yuan Zhan , Yunpeng Bai , Guoliang Fan

Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

In classic reinforcement learning (RL) and decision making problems, policies are evaluated with respect to a scalar reward function, and all optimal policies are the same with regards to their expected return. However, many real-world…

Machine Learning · Computer Science 2023-11-02 Han Shao , Lee Cohen , Avrim Blum , Yishay Mansour , Aadirupa Saha , Matthew R. Walter

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of…

Information Retrieval · Computer Science 2023-08-01 Ruiyang Xu , Jalaj Bhandari , Dmytro Korenkevych , Fan Liu , Yuchen He , Alex Nikulkov , Zheqing Zhu

Towards Return Parity in Markov Decision Processes

Algorithmic decisions made by machine learning models in high-stakes domains may have lasting impacts over time. However, naive applications of standard fairness criterion in static settings over temporal domains may lead to delayed and…

Machine Learning · Computer Science 2022-03-01 Jianfeng Chi , Jian Shen , Xinyi Dai , Weinan Zhang , Yuan Tian , Han Zhao

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Deep Reinforcement Learning of Marked Temporal Point Processes

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such…

Machine Learning · Computer Science 2018-11-07 Utkarsh Upadhyay , Abir De , Manuel Gomez-Rodriguez

Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation

We study policy evaluation problems in multi-task reinforcement learning (RL) under a low-rank representation setting. In this setting, we are given $N$ learning tasks where the corresponding value function of these tasks lie in an…

Machine Learning · Computer Science 2025-03-05 Yitao Bai , Sihan Zeng , Justin Romberg , Thinh T. Doan

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with…

Information Retrieval · Computer Science 2017-08-16 Ziming Li , Julia Kiseleva , Maarten de Rijke , Artem Grotov

Constrained Markov Decision Processes via Backward Value Functions

Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety…

Machine Learning · Computer Science 2020-08-28 Harsh Satija , Philip Amortila , Joelle Pineau

VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model

In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting…

Machine Learning · Computer Science 2025-09-29 Zhizun Wang , David Meger