Navdeep Kumar — Scifaro

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

We establish an optimal sample complexity of $O(\epsilon^{-2})$ for obtaining an $\epsilon$-optimal global policy using a single-timescale actor-critic (AC) algorithm in infinite-horizon discounted Markov decision processes (MDPs) with…

Machine Learning · Computer Science 2026-05-08 Navdeep Kumar , Tehila Dahan , Lior Cohen , Ananyabrata Barua , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models

We study diffusion-based world models for reinforcement learning, which offer high generative fidelity but face critical efficiency challenges in control. Current methods either require heavyweight models at inference or rely on highly…

Machine Learning · Computer Science 2026-02-18 Lior Cohen , Ofir Nabati , Kaixin Wang , Navdeep Kumar , Shie Mannor

Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an…

Machine Learning · Computer Science 2025-06-10 Uri Koren , Navdeep Kumar , Uri Gadot , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

On the Convergence of Single-Timescale Actor-Critic

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework…

Machine Learning · Computer Science 2025-06-05 Navdeep Kumar , Priyank Agrawal , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally…

Artificial Intelligence · Computer Science 2025-02-14 Navdeep Kumar , Adarsh Gupta , Maxence Mohamed Elfatihi , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…

Machine Learning · Computer Science 2024-03-12 Navdeep Kumar , Yashaswini Murthy , Itai Shufaro , Kfir Y. Levy , R. Srikant , Shie Mannor

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance…

Machine Learning · Computer Science 2024-02-13 Uri Gadot , Esther Derman , Navdeep Kumar , Maxence Mohamed Elfatihi , Kfir Levy , Shie Mannor

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their…

Machine Learning · Computer Science 2024-02-13 Kaixin Wang , Uri Gadot , Navdeep Kumar , Kfir Levy , Shie Mannor

Policy Gradient for Rectangular Robust Markov Decision Processes

Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally…

Machine Learning · Computer Science 2023-12-12 Navdeep Kumar , Esther Derman , Matthieu Geist , Kfir Levy , Shie Mannor

Policy Gradient for Reinforcement Learning with General Utilities

In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. This objective may also be viewed as finding a policy that optimizes a linear function of its state-action…

Machine Learning · Computer Science 2023-08-30 Navdeep Kumar , Kaixin Wang , Kfir Levy , Shie Mannor

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do…

Machine Learning · Computer Science 2023-02-01 Navdeep Kumar , Kfir Levy , Kaixin Wang , Shie Mannor

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

Robust Markov decision processes (MDPs) provide a general framework to model decision problems where the system dynamics are changing or only partially known. Efficient methods for some \texttt{sa}-rectangular robust MDPs exist, using its…

Artificial Intelligence · Computer Science 2022-10-06 Navdeep Kumar , Kfir Levy , Kaixin Wang , Shie Mannor

The Geometry of Robust Value Functions

The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov…

Machine Learning · Computer Science 2022-08-12 Kaixin Wang , Navdeep Kumar , Kuangqi Zhou , Bryan Hooi , Jiashi Feng , Shie Mannor