Visual Semantic Planning using Deep Successor Representations

Yuke Zhu; Daniel Gordon; Eric Kolve; Dieter Fox; Li Fei-Fei; Abhinav Gupta; Roozbeh Mottaghi; Ali Farhadi

Visual Semantic Planning using Deep Successor Representations

Computer Vision and Pattern Recognition 2017-08-17 v2 Machine Learning Robotics

Authors: Yuke Zhu , Daniel Gordon , Eric Kolve , Dieter Fox , Li Fei-Fei , Abhinav Gupta , Roozbeh Mottaghi , Ali Farhadi

View on arXiv ↗ PDF ↗

Abstract

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment.

Keywords

representation learning visual grounding computer vision

Cite

@article{arxiv.1705.08080,
  title  = {Visual Semantic Planning using Deep Successor Representations},
  author = {Yuke Zhu and Daniel Gordon and Eric Kolve and Dieter Fox and Li Fei-Fei and Abhinav Gupta and Roozbeh Mottaghi and Ali Farhadi},
  journal= {arXiv preprint arXiv:1705.08080},
  year   = {2017}
}

Comments

ICCV 2017 camera ready

Visual Semantic Planning using Deep Successor Representations

Abstract

Keywords

Cite

Comments

Related papers