Related papers: Visual Semantic Planning using Deep Successor Repr…

Visual Robot Task Planning

Prospection, the act of predicting the consequences of many possible futures, is intrinsic to human planning and action, and may even be at the root of consciousness. Surprisingly, this idea has been explored comparatively little in…

Robotics · Computer Science 2018-04-03 Chris Paxton , Yotam Barnoy , Kapil Katyal , Raman Arora , Gregory D. Hager

Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning

Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in…

Artificial Intelligence · Computer Science 2024-03-28 Yilue Qian , Peiyu Yu , Ying Nian Wu , Yao Su , Wei Wang , Lifeng Fan

Deep Learning Driven Visual Path Prediction from a Single Image

Capabilities of inference and prediction are significant components of visual systems. In this paper, we address an important and challenging task of them: visual path prediction. Its goal is to infer the future path for a visual object in…

Computer Vision and Pattern Recognition · Computer Science 2016-12-16 Siyu Huang , Xi Li , Zhongfei Zhang , Zhouzhou He , Fei Wu , Wei Liu , Jinhui Tang , Yueting Zhuang

Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that…

Robotics · Computer Science 2023-08-02 Hongyou Zhou , Ingmar Schubert , Marc Toussaint , Ozgur S. Oguz

Visual Representations for Semantic Target Driven Navigation

What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to…

Computer Vision and Pattern Recognition · Computer Science 2019-07-04 Arsalan Mousavian , Alexander Toshev , Marek Fiser , Jana Kosecka , Ayzaan Wahid , James Davidson

Learning Plannable Representations with Causal InfoGAN

In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual…

Machine Learning · Computer Science 2018-07-27 Thanard Kurutach , Aviv Tamar , Ge Yang , Stuart Russell , Pieter Abbeel

Semantic World Models

Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective…

Machine Learning · Computer Science 2025-10-23 Jacob Berg , Chuning Zhu , Yanda Bao , Ishan Durugkar , Abhishek Gupta

Learning and Planning with a Semantic Model

Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are…

Machine Learning · Computer Science 2018-10-01 Yi Wu , Yuxin Wu , Aviv Tamar , Stuart Russell , Georgia Gkioxari , Yuandong Tian

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

Physically rearranging objects is an important capability for embodied agents. Visual room rearrangement evaluates an agent's ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet…

Computer Vision and Pattern Recognition · Computer Science 2022-08-11 Brandon Trabucco , Gunnar Sigurdsson , Robinson Piramuthu , Gaurav S. Sukhatme , Ruslan Salakhutdinov

Learning to Imagine Manipulation Goals for Robot Task Planning

Prospection is an important part of how humans come up with new task plans, but has not been explored in depth in robotics. Predicting multiple task-level is a challenging problem that involves capturing both task semantics and continuous…

Machine Learning · Computer Science 2017-11-13 Chris Paxton , Kapil Katyal , Christian Rupprecht , Raman Arora , Gregory D. Hager

Embodied Visual Active Learning for Semantic Segmentation

We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Self-Supervised Visual Planning with Temporal Skip Connections

In order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected…

Robotics · Computer Science 2017-10-18 Frederik Ebert , Chelsea Finn , Alex X. Lee , Sergey Levine

Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a…

Machine Learning · Computer Science 2020-06-11 Danny Driess , Jung-Su Ha , Marc Toussaint

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge,…

Computer Vision and Pattern Recognition · Computer Science 2016-09-19 Yuke Zhu , Roozbeh Mottaghi , Eric Kolve , Joseph J. Lim , Abhinav Gupta , Li Fei-Fei , Ali Farhadi

Neural Network based Successor Representations of Space and Language

How does the mind organize thoughts? The hippocampal-entorhinal complex is thought to support domain-general representation and processing of structural knowledge of arbitrary state, feature and concept spaces. In particular, it enables the…

Artificial Intelligence · Computer Science 2022-02-24 Paul Stoewer , Christian Schlieker , Achim Schilling , Claus Metzner , Andreas Maier , Patrick Krauss

Visual Semantic Navigation using Scene Priors

How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we search cabinets near the coffee machine and…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Wei Yang , Xiaolong Wang , Ali Farhadi , Abhinav Gupta , Roozbeh Mottaghi

Vision-based Navigation Using Deep Reinforcement Learning

Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic environments is a challenging task. We propose a novel learning…

Robotics · Computer Science 2019-11-12 Jonáš Kulhánek , Erik Derner , Tim de Bruin , Robert Babuška

Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

Robotics · Computer Science 2025-05-28 Nikos Giannakakis , Argyris Manetas , Panagiotis P. Filntisis , Petros Maragos , George Retsinas

Learning to Map for Active Semantic Goal Navigation

We consider the problem of object goal navigation in unseen environments. Solving this problem requires learning of contextual semantic priors, a challenging endeavour given the spatial and semantic variability of indoor environments.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Georgios Georgakis , Bernadette Bucher , Karl Schmeckpeper , Siddharth Singh , Kostas Daniilidis

Object-oriented Targets for Visual Navigation using Rich Semantic Representations

When searching for an object humans navigate through a scene using semantic information and spatial relationships. We look for an object using our knowledge of its attributes and relationships with other objects to infer the probable…

Computer Vision and Pattern Recognition · Computer Science 2018-12-18 Jean-Benoit Delbrouck , Stéphane Dupont