Related papers: Understanding Spatial Relations through Multiple M…

Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates

Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit…

Artificial Intelligence · Computer Science 2020-03-09 Guillem Collell , Luc Van Gool , Marie-Francine Moens

A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation

Over the last two decades we have witnessed strong progress on modeling visual object classes, scenes and attributes that have significantly contributed to automated image understanding. On the other hand, surprisingly little progress has…

Computer Vision and Pattern Recognition · Computer Science 2015-05-06 Mateusz Malinowski , Mario Fritz

Can Transformers Capture Spatial Relations between Objects?

Spatial relationships between objects represent key scene information for humans to understand and interact with the world. To study the capability of current computer vision systems to recognize physically grounded spatial relations, we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-04 Chuan Wen , Dinesh Jayaraman , Yang Gao

Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive…

Robotics · Computer Science 2023-05-17 Rainer Kartmann , Tamim Asfour

Perspective alignment in spatial language

It is well known that perspective alignment plays a major role in the planning and interpretation of spatial language. In order to understand the role of perspective alignment and the cognitive processes involved, we have made precise…

Artificial Intelligence · Computer Science 2008-02-13 L. Steels , M. Loetzsch

Improving Information Extraction from Images with Learned Semantic Models

Many applications require an understanding of an image that goes beyond the simple detection and classification of its objects. In particular, a great deal of semantic information is carried in the relationships between objects. We have…

Artificial Intelligence · Computer Science 2018-08-28 Stephan Baier , Yunpu Ma , Volker Tresp

Inferring spatial relations from textual descriptions of images

Generating an image from its textual description requires both a certain level of language understanding and common sense knowledge about the spatial relations of the physical entities being described. In this work, we focus on inferring…

Artificial Intelligence · Computer Science 2021-02-03 Aitzol Elu , Gorka Azkune , Oier Lopez de Lacalle , Ignacio Arganda-Carreras , Aitor Soroa , Eneko Agirre

From Spatial Relations to Spatial Configurations

Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos. Existing spatial…

Computation and Language · Computer Science 2020-07-21 Soham Dan , Parisa Kordjamshidi , Julia Bonn , Archna Bhatia , Jon Cai , Martha Palmer , Dan Roth

Encoding Spatial Relations from Natural Language

Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the…

Computation and Language · Computer Science 2018-07-06 Tiago Ramalho , Tomáš Kočiský , Frederic Besse , S. M. Ali Eslami , Gábor Melis , Fabio Viola , Phil Blunsom , Karl Moritz Hermann

Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs)…

Robotics · Computer Science 2025-09-03 Jiading Fang

Metric Learning for Generalizing Spatial Relations to New Objects

Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize…

Robotics · Computer Science 2017-07-25 Oier Mees , Nichola Abdo , Mladen Mazuran , Wolfram Burgard

Commonsense Spatial Reasoning for Visually Intelligent Agents

Service robots are expected to reliably make sense of complex, fast-changing environments. From a cognitive standpoint, they need the appropriate reasoning capabilities and background knowledge required to exhibit human-like Visual…

Artificial Intelligence · Computer Science 2021-04-02 Agnese Chiatti , Gianluca Bardaro , Enrico Motta , Enrico Daga

Linear Spatial World Models Emerge in Large Language Models

Large language models (LLMs) have demonstrated emergent abilities across diverse tasks, raising the question of whether they acquire internal world models. In this work, we investigate whether LLMs implicitly encode linear spatial world…

Artificial Intelligence · Computer Science 2025-06-04 Matthieu Tehenan , Christian Bolivar Moya , Tenghai Long , Guang Lin

Robust and Interpretable Grounding of Spatial References with Relation Networks

Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal…

Computation and Language · Computer Science 2020-10-08 Tsung-Yen Yang , Andrew S. Lan , Karthik Narasimhan

Understanding, Categorizing and Predicting Semantic Image-Text Relations

Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text…

Multimedia · Computer Science 2019-06-21 Christian Otto , Matthias Springstein , Avishek Anand , Ralph Ewerth

Spatial Computing and Intuitive Interaction: Bringing Mixed Reality and Robotics Together

Spatial computing -- the ability of devices to be aware of their surroundings and to represent this digitally -- offers novel capabilities in human-robot interaction. In particular, the combination of spatial computing and egocentric…

Robotics · Computer Science 2022-02-04 Jeffrey Delmerico , Roi Poranne , Federica Bogo , Helen Oleynikova , Eric Vollenweider , Stelian Coros , Juan Nieto , Marc Pollefeys

Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Objects rarely sit in isolation in everyday human environments. If we want robots to operate and perform tasks in our human environments, they must understand how the objects they manipulate will interact with structural elements of the…

Robotics · Computer Science 2024-01-30 Yixuan Huang , Nichols Crawford Taylor , Adam Conkey , Weiyu Liu , Tucker Hermans

Predicting Stable Configurations for Semantic Placement of Novel Objects

Human environments contain numerous objects configured in a variety of arrangements. Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We break this problem…

Robotics · Computer Science 2021-08-30 Chris Paxton , Chris Xie , Tucker Hermans , Dieter Fox

Structured Spatial Reasoning with Open Vocabulary Object Detectors

Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Negar Nejatishahidin , Madhukar Reddy Vongala , Jana Kosecka

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

Understanding the spatial relations between objects in images is a surprisingly challenging task. A chair may be "behind" a person even if it appears to the left of the person in the image (depending on which way the person is facing). Two…

Computer Vision and Pattern Recognition · Computer Science 2019-09-02 Kaiyu Yang , Olga Russakovsky , Jia Deng