Related papers: Perspective alignment in spatial language

Understanding Spatial Relations through Multiple Modalities

Recognizing spatial relations and reasoning about them is essential in multiple applications including navigation, direction giving and human-computer interaction in general. Spatial relations between objects can either be explicit --…

Computation and Language · Computer Science 2020-07-21 Soham Dan , Hangfeng He , Dan Roth

Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs)…

Robotics · Computer Science 2025-09-03 Jiading Fang

Grounding Dynamic Spatial Relations for Embodied (Robot) Interaction

This paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic…

Computation and Language · Computer Science 2016-07-27 Michael Spranger , Jakob Suchan , Mehul Bhatt , Manfred Eppe

Towards Navigation by Reasoning over Spatial Configurations

We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions…

Computation and Language · Computer Science 2021-05-17 Yue Zhang , Quan Guo , Parisa Kordjamshidi

Predicting Stable Configurations for Semantic Placement of Novel Objects

Human environments contain numerous objects configured in a variety of arrangements. Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We break this problem…

Robotics · Computer Science 2021-08-30 Chris Paxton , Chris Xie , Tucker Hermans , Dieter Fox

Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods

Spatial reasoning, which requires ability to perceive and manipulate spatial relationships in the 3D world, is a fundamental aspect of human intelligence, yet remains a persistent challenge for Multimodal large language models (MLLMs).…

Artificial Intelligence · Computer Science 2025-11-21 Weichen Liu , Qiyao Xue , Haoming Wang , Xiangyu Yin , Boyuan Yang , Wei Gao

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning

Grounded understanding of natural language in physical scenes can greatly benefit robots that follow human instructions. In object manipulation scenarios, existing end-to-end models are proficient at understanding semantic concepts, but…

Robotics · Computer Science 2023-04-03 Qian Luo , Yunfei Li , Yi Wu

Commonsense Spatial Reasoning for Visually Intelligent Agents

Service robots are expected to reliably make sense of complex, fast-changing environments. From a cognitive standpoint, they need the appropriate reasoning capabilities and background knowledge required to exhibit human-like Visual…

Artificial Intelligence · Computer Science 2021-04-02 Agnese Chiatti , Gianluca Bardaro , Enrico Motta , Enrico Daga

From Spatial Relations to Spatial Configurations

Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos. Existing spatial…

Computation and Language · Computer Science 2020-07-21 Soham Dan , Parisa Kordjamshidi , Julia Bonn , Archna Bhatia , Jon Cai , Martha Palmer , Dan Roth

Getting aligned on representational alignment

Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by…

Neurons and Cognition · Quantitative Biology 2024-11-27 Ilia Sucholutsky , Lukas Muttenthaler , Adrian Weller , Andi Peng , Andreea Bobu , Been Kim , Bradley C. Love , Christopher J. Cueva , Erin Grant , Iris Groen , Jascha Achterberg , Joshua B. Tenenbaum , Katherine M. Collins , Katherine L. Hermann , Kerem Oktar , Klaus Greff , Martin N. Hebart , Nathan Cloos , Nikolaus Kriegeskorte , Nori Jacoby , Qiuyi Zhang , Raja Marjieh , Robert Geirhos , Sherol Chen , Simon Kornblith , Sunayana Rane , Talia Konkle , Thomas P. O'Connell , Thomas Unterthiner , Andrew K. Lampinen , Klaus-Robert Müller , Mariya Toneva , Thomas L. Griffiths

Exploring Spatial Schema Intuitions in Large Language and Vision Models

Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action.…

Computation and Language · Computer Science 2024-05-28 Philipp Wicke , Lennart Wachowiak

Structured Spatial Reasoning with Open Vocabulary Object Detectors

Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Negar Nejatishahidin , Madhukar Reddy Vongala , Jana Kosecka

A Paradigm for Situated and Goal-Driven Language Learning

A distinguishing property of human intelligence is the ability to flexibly use language in order to communicate complex ideas with other humans in a variety of contexts. Research in natural language dialogue should focus on designing…

Computation and Language · Computer Science 2016-10-13 Jon Gauthier , Igor Mordatch

Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication

Effective communication requires the ability to refer to specific parts of an observation in relation to others. While emergent communication literature shows success in developing various language properties, no research has shown the…

Computation and Language · Computer Science 2024-10-29 Olaf Lipinski , Adam J. Sobey , Federico Cerutti , Timothy J. Norman

Representation Learning for Grounded Spatial Reasoning

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive…

Computation and Language · Computer Science 2017-11-15 Michael Janner , Karthik Narasimhan , Regina Barzilay

A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation

Over the last two decades we have witnessed strong progress on modeling visual object classes, scenes and attributes that have significantly contributed to automated image understanding. On the other hand, surprisingly little progress has…

Computer Vision and Pattern Recognition · Computer Science 2015-05-06 Mateusz Malinowski , Mario Fritz

Conversational Alignment with Artificial Intelligence in Context

The development of sophisticated artificial intelligence (AI) conversational agents based on large language models raises important questions about the relationship between human norms, values, and practices and AI design and performance.…

Computers and Society · Computer Science 2025-05-30 Rachel Katharine Sterken , James Ravi Kirkpatrick

Composing Pick-and-Place Tasks By Grounding Language

Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction. In this work, we present a robot system that follows unconstrained language instructions to pick and place arbitrary…

Robotics · Computer Science 2021-02-17 Oier Mees , Wolfram Burgard

Language models and brains align due to more than next-word prediction and word-level information

Pretrained language models have been shown to significantly predict brain recordings of people comprehending language. Recent work suggests that the prediction of the next word is a key mechanism that contributes to this alignment. What is…

Computation and Language · Computer Science 2024-10-04 Gabriele Merlin , Mariya Toneva

A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science

Over the past year, the development of large language models (LLMs) has brought spatial intelligence into focus, with much attention on vision-based embodied intelligence. However, spatial intelligence spans a broader range of disciplines…

Artificial Intelligence · Computer Science 2025-04-15 Jie Feng , Jinwei Zeng , Qingyue Long , Hongyi Chen , Jie Zhao , Yanxin Xi , Zhilun Zhou , Yuan Yuan , Shengyuan Wang , Qingbin Zeng , Songwei Li , Yunke Zhang , Yuming Lin , Tong Li , Jingtao Ding , Chen Gao , Fengli Xu , Yong Li