Related papers: Can Transformers Capture Spatial Relations between…

A Spatial Relationship Aware Dataset for Robotics

Robotic task planning in real-world environments requires not only object recognition but also a nuanced understanding of spatial relationships between objects. We present a spatial-relationship-aware dataset of nearly 1,000 robot-acquired…

Robotics · Computer Science 2025-06-17 Peng Wang , Minh Huy Pham , Zhihao Guo , Wei Zhou

Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Objects rarely sit in isolation in everyday human environments. If we want robots to operate and perform tasks in our human environments, they must understand how the objects they manipulate will interact with structural elements of the…

Robotics · Computer Science 2024-01-30 Yixuan Huang , Nichols Crawford Taylor , Adam Conkey , Weiyu Liu , Tucker Hermans

Understanding Spatial Relations through Multiple Modalities

Recognizing spatial relations and reasoning about them is essential in multiple applications including navigation, direction giving and human-computer interaction in general. Spatial relations between objects can either be explicit --…

Computation and Language · Computer Science 2020-07-21 Soham Dan , Hangfeng He , Dan Roth

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

Identifying objects in an image and their mutual relationships as a scene graph leads to a deep understanding of image content. Despite the recent advancement in deep learning, the detection and labeling of visual object relationships…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Rajat Koner , Poulami Sinhamahapatra , Volker Tresp

A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships

Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Gracile Astlin Pereira , Muhammad Hussain

Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

Robots coexisting with humans in their environment and performing services for them need the ability to interact with them. One particular requirement for such robots is that they are able to understand spatial relations and can place…

Robotics · Computer Science 2020-02-24 Oier Mees , Alp Emek , Johan Vertens , Wolfram Burgard

Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects

Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Michael A. Lepori , Alexa R. Tartaglini , Wai Keen Vong , Thomas Serre , Brenden M. Lake , Ellie Pavlick

Understanding and Exploiting Object Interaction Landscapes

Interactions play a key role in understanding objects and scenes, for both virtual and real world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or…

Graphics · Computer Science 2016-11-09 Sören Pirk , Vojtech Krs , Kaimo Hu , Suren Deepak Rajasekaran , Hao Kang , Bedrich Benes , Yusuke Yoshiyasu , Leonidas J. Guibas

Where is My Stuff? An Interactive System for Spatial Relations

In this paper we present a system that detects and tracks objects and agents, computes spatial relations, and communicates those relations to the user using speech. Our system is able to detect multiple objects and agents at 30 frames per…

Robotics · Computer Science 2019-09-16 E. Akin Sisbot , Jonathan H. Connell

Discovering Spatial Relationships by Transformers for Domain Generalization

Due to the rapid increase in the diversity of image data, the problem of domain generalization has received increased attention recently. While domain generalization is a challenging problem, it has achieved great development thanks to the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Cuicui Kang , Karthik Nandakumar

Grounding Spatio-Temporal Language with Transformers

Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the…

Artificial Intelligence · Computer Science 2021-10-12 Tristan Karch , Laetitia Teodorescu , Katja Hofmann , Clément Moulin-Frier , Pierre-Yves Oudeyer

Metric Learning for Generalizing Spatial Relations to New Objects

Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize…

Robotics · Computer Science 2017-07-25 Oier Mees , Nichola Abdo , Mladen Mazuran , Wolfram Burgard

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

Understanding the spatial relations between objects in images is a surprisingly challenging task. A chair may be "behind" a person even if it appears to the left of the person in the image (depending on which way the person is facing). Two…

Computer Vision and Pattern Recognition · Computer Science 2019-09-02 Kaiyu Yang , Olga Russakovsky , Jia Deng

Structured Spatial Reasoning with Open Vocabulary Object Detectors

Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Negar Nejatishahidin , Madhukar Reddy Vongala , Jana Kosecka

Learning-based Relational Object Matching Across Views

Intelligent robots require object-level scene understanding to reason about possible tasks and interactions with the environment. Moreover, many perception tasks such as scene reconstruction, image retrieval, or place recognition can…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 Cathrin Elich , Iro Armeni , Martin R. Oswald , Marc Pollefeys , Joerg Stueckler

SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks

Recognizing precise geometrical configurations of groups of objects is a key capability of human spatial cognition, yet little studied in the deep learning literature so far. In particular, a fundamental problem is how a machine can learn…

Machine Learning · Computer Science 2020-07-20 Laetitia Teodorescu , Katja Hofmann , Pierre-Yves Oudeyer

Commonsense Spatial Reasoning for Visually Intelligent Agents

Service robots are expected to reliably make sense of complex, fast-changing environments. From a cognitive standpoint, they need the appropriate reasoning capabilities and background knowledge required to exhibit human-like Visual…

Artificial Intelligence · Computer Science 2021-04-02 Agnese Chiatti , Gianluca Bardaro , Enrico Motta , Enrico Daga

Recognizing Objects In-the-wild: Where Do We Stand?

The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual…

Robotics · Computer Science 2018-05-23 Mohammad Reza Loghmani , Barbara Caputo , Markus Vincze

Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive…

Robotics · Computer Science 2023-05-17 Rainer Kartmann , Tamim Asfour

Discovering objects and their relations from entangled scene representations

Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by…

Machine Learning · Computer Science 2017-02-17 David Raposo , Adam Santoro , David Barrett , Razvan Pascanu , Timothy Lillicrap , Peter Battaglia