English

Self-Consistent Model-based Adaptation for Visual Reinforcement Learning

Computer Vision and Pattern Recognition 2025-02-17 v1 Machine Learning

Abstract

Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.

Keywords

Cite

@article{arxiv.2502.09923,
  title  = {Self-Consistent Model-based Adaptation for Visual Reinforcement Learning},
  author = {Xinning Zhou and Chengyang Ying and Yao Feng and Hang Su and Jun Zhu},
  journal= {arXiv preprint arXiv:2502.09923},
  year   = {2025}
}
R2 v1 2026-06-28T21:44:04.167Z