English
Related papers

Related papers: Deeply Semantic Inductive Spatio-Temporal Learning

200 papers

Spatio-temporal reasoning is a remarkable capability of Vision Language Models (VLMs), but the underlying mechanisms of such abilities remain largely opaque. We postulate that visual/geometrical and textual representations of spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Raphi Kang , Hongqiao Chen , Georgia Gkioxari , Pietro Perona

Although deep learning-based methods have shown great success in spatiotemporal predictive learning, the framework of those models is designed mainly by intuition. How to make spatiotemporal forecasting with theoretical guarantees is still…

Machine Learning · Computer Science 2024-02-26 Tongyi Liang , Han-Xiong Li

Symbolic learning represents the most straightforward approach to interpretable modeling, but its applications have been hampered by a single structural design choice: the adoption of propositional logic as the underlying language.…

Machine Learning · Computer Science 2021-09-20 Giovanni Pagliarini , Guido Sciavicco

Executing complex manipulation in cluttered environments requires satisfying coupled geometric and temporal constraints. Although Spatio-Temporal Logic (SpaTiaL) offers a principled specification framework, its use in gradient-based…

Robotics · Computer Science 2026-04-09 Licheng Luo , Kaier Liang , Cristian-Ioan Vasile , Mingyu Cai

Visual-spatial understanding, the ability to infer object relationships and layouts from visual input, is fundamental to downstream tasks such as robotic navigation and embodied interaction. However, existing methods face spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Haoyu Zhang , Meng Liu , Zaijing Li , Haokun Wen , Weili Guan , Yaowei Wang , Liqiang Nie

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Xun Liang , Xin Guo , Zhongming Jin , Weihang Pan , Penghui Shang , Deng Cai , Binbin Lin , Jieping Ye

We introduce a dynamical spatio-temporal model formalized as a recurrent neural network for forecasting time series of spatial processes, i.e. series of observations sharing temporal and spatial dependencies. The model learns these…

Machine Learning · Computer Science 2018-04-24 Ali Ziat , Edouard Delasalles , Ludovic Denoyer , Patrick Gallinari

Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language…

Robotics · Computer Science 2025-11-03 Simindokht Jahangard , Mehrzad Mohammadi , Abhinav Dhall , Hamid Rezatofighi

Spatio-temporal deep learning models aims to utilize useful patterns in such data to support tasks like prediction. However, previous deep learning models designed for specific tasks typically require separate training for each use case,…

We present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic…

Artificial Intelligence · Computer Science 2015-08-14 Jakob Suchan , Mehul Bhatt , Harshita Jhavar

Understanding visual scenes requires not only recognizing objects but also reasoning about their spatial relationships. Unlike general vision-language tasks, spatial reasoning requires integrating multiple inductive biases, such as 2D…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Chan Yeong Hwang , Miso Choi , Sunghyun On , Jinkyu Kim , Jungbeom Lee

Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to…

Machine Learning · Computer Science 2022-04-11 Sichen Zhao , Wei Shao , Jeffrey Chan , Flora D. Salim

Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge…

Machine Learning · Computer Science 2023-02-07 Chuanpan Zheng , Xiaoliang Fan , Cheng Wang , Jianzhong Qi , Chaochao Chen , Longbiao Chen

Vision-language models (VLMs) have advanced multimodal reasoning but still face challenges in spatial reasoning for 3D scenes and complex object configurations. To address this, we introduce SpatialViLT, an enhanced VLM that integrates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Chashi Mahiul Islam , Oteo Mamo , Samuel Jacob Chacko , Xiuwen Liu , Weikuan Yu

We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long…

Machine Learning · Computer Science 2016-09-02 Viorica Patraucean , Ankur Handa , Roberto Cipolla

Understanding how the predictions of deep learning models are formed during the training process is crucial to improve model performance and fix model defects, especially when we need to investigate nontrivial training strategies such as…

Machine Learning · Computer Science 2022-01-05 Xianglin Yang , Yun Lin , Ruofan Liu , Zhenfeng He , Chao Wang , Jin Song Dong , Hong Mei

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a…

Computer Vision and Pattern Recognition · Computer Science 2017-08-17 Yuke Zhu , Daniel Gordon , Eric Kolve , Dieter Fox , Li Fei-Fei , Abhinav Gupta , Roozbeh Mottaghi , Ali Farhadi

Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a…

Computer Vision and Pattern Recognition · Computer Science 2020-06-29 Polina Zablotskaia , Edoardo A. Dominici , Leonid Sigal , Andreas M. Lehrmann

As the field of deep learning steadily transitions from the realm of academic research to practical application, the significance of self-supervised pretraining methods has become increasingly prominent. These methods, particularly in the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Toni Albert , Bjoern Eskofier , Dario Zanca

People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 R. Kenny Jones , Siddhartha Chaudhuri , Daniel Ritchie
‹ Prev 1 2 3 10 Next ›