Related papers: Deeply Semantic Inductive Spatio-Temporal Learning

Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models

Spatio-temporal reasoning is a remarkable capability of Vision Language Models (VLMs), but the underlying mechanisms of such abilities remain largely opaque. We postulate that visual/geometrical and textual representations of spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Raphi Kang , Hongqiao Chen , Georgia Gkioxari , Pietro Perona

Spatiotemporal Observer Design for Predictive Learning of High-Dimensional Data

Although deep learning-based methods have shown great success in spatiotemporal predictive learning, the framework of those models is designed mainly by intuition. How to make spatiotemporal forecasting with theoretical guarantees is still…

Machine Learning · Computer Science 2024-02-26 Tongyi Liang , Han-Xiong Li

Decision Tree Learning with Spatial Modal Logics

Symbolic learning represents the most straightforward approach to interpretable modeling, but its applications have been hampered by a single structural design choice: the adoption of propositional logic as the underlying language.…

Machine Learning · Computer Science 2021-09-20 Giovanni Pagliarini , Guido Sciavicco

Differentiable SpaTiaL: Symbolic Learning and Reasoning with Geometric Temporal Logic for Manipulation Tasks

Executing complex manipulation in cluttered environments requires satisfying coupled geometric and temporal constraints. Although Spatio-Temporal Logic (SpaTiaL) offers a principled specification framework, its use in gradient-based…

Robotics · Computer Science 2026-04-09 Licheng Luo , Kaier Liang , Cristian-Ioan Vasile , Mingyu Cai

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Visual-spatial understanding, the ability to infer object relationships and layouts from visual input, is fundamental to downstream tasks such as robotic navigation and embodied interaction. However, existing methods face spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Haoyu Zhang , Meng Liu , Zaijing Li , Haokun Wen , Weili Guan , Yaowei Wang , Liqiang Nie

Enhancing Spatial Reasoning through Visual and Textual Thinking

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Xun Liang , Xin Guo , Zhongming Jin , Weihang Pan , Penghui Shang , Deng Cai , Binbin Lin , Jieping Ye

Spatio-Temporal Neural Networks for Space-Time Series Forecasting and Relations Discovery

We introduce a dynamical spatio-temporal model formalized as a recurrent neural network for forecasting time series of spatial processes, i.e. series of observations sharing temporal and spatial dependencies. The model learns these…

Machine Learning · Computer Science 2018-04-24 Ali Ziat , Edouard Delasalles , Ludovic Denoyer , Patrick Gallinari

A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics

Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language…

Robotics · Computer Science 2025-11-03 Simindokht Jahangard , Mehrzad Mohammadi , Abhinav Dhall , Hamid Rezatofighi

Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review

Spatio-temporal deep learning models aims to utilize useful patterns in such data to support tasks like prediction. However, previous deep learning models designed for specific tasks typically require separate training for each use case,…

Machine Learning · Computer Science 2025-06-03 Yuchen Fang , Hao Miao , Yuxuan Liang , Liwei Deng , Yue Cui , Ximu Zeng , Yuyang Xia , Yan Zhao , Torben Bach Pedersen , Christian S. Jensen , Xiaofang Zhou , Kai Zheng

Talking about the Moving Image: A Declarative Model for Image Schema Based Embodied Perception Grounding and Language Generation

We present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic…

Artificial Intelligence · Computer Science 2015-08-14 Jakob Suchan , Mehul Bhatt , Harshita Jhavar

SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

Understanding visual scenes requires not only recognizing objects but also reasoning about their spatial relationships. Unlike general vision-language tasks, spatial reasoning requires integrating multiple inductive biases, such as 2D…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Chan Yeong Hwang , Miso Choi , Sunghyun On , Jinkyu Kim , Jungbeom Lee

Measuring disentangled generative spatio-temporal representation

Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to…

Machine Learning · Computer Science 2022-04-11 Sichen Zhao , Wei Shao , Jeffrey Chan , Flora D. Salim

INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging

Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge…

Machine Learning · Computer Science 2023-02-07 Chuanpan Zheng , Xiaoliang Fan , Cheng Wang , Jianzhong Qi , Chaochao Chen , Longbiao Chen

Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task Learning

Vision-language models (VLMs) have advanced multimodal reasoning but still face challenges in spatial reasoning for 3D scenes and complex object configurations. To address this, we introduce SpatialViLT, an enhanced VLM that integrates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Chashi Mahiul Islam , Oteo Mamo , Samuel Jacob Chacko , Xiuwen Liu , Weikuan Yu

Spatio-temporal video autoencoder with differentiable memory

We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long…

Machine Learning · Computer Science 2016-09-02 Viorica Patraucean , Ankur Handa , Roberto Cipolla

DeepVisualInsight: Time-Travelling Visualization for Spatio-Temporal Causality of Deep Classification Training

Understanding how the predictions of deep learning models are formed during the training process is crucial to improve model performance and fix model defects, especially when we need to investigate nontrivial training strategies such as…

Machine Learning · Computer Science 2022-01-05 Xianglin Yang , Yun Lin , Ruofan Liu , Zhenfeng He , Chao Wang , Jin Song Dong , Hong Mei

Visual Semantic Planning using Deep Successor Representations

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a…

Computer Vision and Pattern Recognition · Computer Science 2017-08-17 Yuke Zhu , Daniel Gordon , Eric Kolve , Dieter Fox , Li Fei-Fei , Abhinav Gupta , Roozbeh Mottaghi , Ali Farhadi

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a…

Computer Vision and Pattern Recognition · Computer Science 2020-06-29 Polina Zablotskaia , Edoardo A. Dominici , Leonid Sigal , Andreas M. Lehrmann

From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations

As the field of deep learning steadily transitions from the realm of academic research to practical application, the significance of self-supervised pretraining methods has become increasingly prominent. These methods, particularly in the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Toni Albert , Bjoern Eskofier , Dario Zanca

Learning to Infer Generative Template Programs for Visual Concepts

People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 R. Kenny Jones , Siddhartha Chaudhuri , Daniel Ritchie