English

Learning Memory-Dependent Continuous Control from Demonstrations

Machine Learning 2021-02-19 v1 Artificial Intelligence

Abstract

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

Keywords

Cite

@article{arxiv.2102.09208,
  title  = {Learning Memory-Dependent Continuous Control from Demonstrations},
  author = {Siqing Hou and Dongqi Han and Jun Tani},
  journal= {arXiv preprint arXiv:2102.09208},
  year   = {2021}
}

Comments

10 pages, 6 figures

R2 v1 2026-06-23T23:16:43.122Z