English

Improving Controller Generalization with Dimensionless Markov Decision Processes

Machine Learning 2025-04-15 v1

Abstract

Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process (Π\Pi-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-Π\Pi theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.

Keywords

Cite

@article{arxiv.2504.10006,
  title  = {Improving Controller Generalization with Dimensionless Markov Decision Processes},
  author = {Valentin Charvet and Sebastian Stein and Roderick Murray-Smith},
  journal= {arXiv preprint arXiv:2504.10006},
  year   = {2025}
}

Comments

11 pages, 5 figures