Deceptive Sequential Decision-Making via Regularized Policy Optimization

Yerin Kim; Alexander Benvenuti; Bo Chen; Mustafa Karabag; Abhishek Kulkarni; Nathaniel D. Bastian; Ufuk Topcu; Matthew Hale

Deceptive Sequential Decision-Making via Regularized Policy Optimization

Machine Learning 2026-01-22 v3 Optimization and Control

Authors: Yerin Kim , Alexander Benvenuti , Bo Chen , Mustafa Karabag , Abhishek Kulkarni , Nathaniel D. Bastian , Ufuk Topcu , Matthew Hale

View on arXiv ↗ PDF ↗

Abstract

Autonomous systems are increasingly expected to operate in the presence of adversaries, though adversaries may infer sensitive information simply by observing a system. Therefore, present a deceptive sequential decision-making framework that not only conceals sensitive information, but actively misleads adversaries about it. We model autonomous systems as Markov decision processes, with adversaries using inverse reinforcement learning to recover reward functions. To counter them, we present three regularization strategies for policy synthesis problems that actively deceive an adversary about a system's reward. ``Diversionary deception'' leads an adversary to draw any false conclusion about the system's reward function. ``Targeted deception'' leads an adversary to draw a specific false conclusion about the system's reward function. ``Equivocal deception'' leads an adversary to infer that the real reward and a false reward both explain the system's behavior. We show how each form of deception can be implemented in policy optimization problems and analytically bound the loss in total accumulated reward induced by deception. Next, we evaluate these developments in a multi-agent setting. We show that diversionary, targeted, and equivocal deception all steer the adversary to false beliefs while still attaining a total accumulated reward that is at least 98% of its optimal, non-deceptive value.

Keywords

bayesian persuasion reinforcement learning markov decision processes

Cite

@article{arxiv.2501.18803,
  title  = {Deceptive Sequential Decision-Making via Regularized Policy Optimization},
  author = {Yerin Kim and Alexander Benvenuti and Bo Chen and Mustafa Karabag and Abhishek Kulkarni and Nathaniel D. Bastian and Ufuk Topcu and Matthew Hale},
  journal= {arXiv preprint arXiv:2501.18803},
  year   = {2026}
}

Comments

18 pages, 5 figures

Deceptive Sequential Decision-Making via Regularized Policy Optimization

Abstract

Keywords

Cite

Comments

Related papers