English
Related papers

Related papers: A Reduction-based Framework for Sequential Decisio…

200 papers

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and…

Artificial Intelligence · Computer Science 2025-11-17 Mohammadsina Almasi , Hadis Anahideh

For marketing, we sometimes need to recommend content for multiple pages in sequence. Different from general sequential decision making process, the use cases have a simpler flow where customers per seeing recommended content on each page…

Machine Learning · Computer Science 2022-03-18 Wenjun Zeng , Yi Liu

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in…

Machine Learning · Computer Science 2021-12-16 Tal Lancewicki , Aviv Rosenberg , Yishay Mansour

A standard assumption in Reinforcement Learning is that the agent observes every visited state-action pair in the associated Markov Decision Process (MDP), along with the per-step rewards. Strong theoretical results are known in this…

Machine Learning · Computer Science 2026-02-03 Zhengjia Zhuo , Anupam Gupta , Viswanath Nagarajan

Learning in multi-player games can model a large variety of practical scenarios, where each player seeks to optimize its own local objective function, which at the same time relies on the actions taken by others. Motivated by the frequent…

Optimization and Control · Mathematics 2023-09-08 Yuanhanqing Huang , Jianghai Hu

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented…

Machine Learning · Computer Science 2021-05-10 Baiming Chen , Mengdi Xu , Liang Li , Ding Zhao

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB…

Machine Learning · Computer Science 2019-07-31 Larkin Liu , Richard Downe , Joshua Reid

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision…

Machine Learning · Computer Science 2023-01-24 Tiancheng Jin , Tal Lancewicki , Haipeng Luo , Yishay Mansour , Aviv Rosenberg

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is…

Machine Learning · Computer Science 2023-03-02 Ronald C. van den Broek , Rik Litjens , Tobias Sagis , Luc Siecker , Nina Verbeeke , Pratik Gajane

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that…

Machine Learning · Computer Science 2021-08-18 Somjit Nath , Mayank Baranwal , Harshad Khadilkar

In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs). In this paper, we study whether there exist…

Machine Learning · Computer Science 2019-11-05 Andrea Zanette , Emma Brunskill

The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where…

Systems and Control · Electrical Eng. & Systems 2024-06-14 Donghwan Lee , Han-Dong Lim , Do Wan Kim

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by…

We study online learning in finite-horizon episodic Markov decision processes (MDPs) under the challenging aggregate bandit feedback model, where the learner observes only the cumulative loss incurred in each episode, rather than individual…

Machine Learning · Computer Science 2025-10-28 Shinji Ito , Kevin Jamieson , Haipeng Luo , Arnab Maiti , Taira Tsuchiya

Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life…

Artificial Intelligence · Computer Science 2021-12-28 Baihan Lin , Guillermo Cecchi , Djallel Bouneffouf , Jenna Reinen , Irina Rish

We study a system with finitely many groups of multi-action bandit processes, each of which is a Markov decision process (MDP) with finite state and action spaces and potentially different transition matrices when taking different actions.…

Optimization and Control · Mathematics 2024-12-05 Jing Fu , Bill Moran , José Niño-Mora

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese
‹ Prev 1 2 3 10 Next ›