Related papers: Exploring Spatial Intelligence from a Generative P…

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Spatial intelligence is essential for multimodal large language models (MLLMs) operating in the complex physical world. Existing benchmarks, however, probe only single-image relations and thus fail to assess the multi-image spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Sihan Yang , Runsen Xu , Yiman Xie , Sizhe Yang , Mo Li , Jingli Lin , Chenming Zhu , Xiaochen Chen , Haodong Duan , Xiangyu Yue , Dahua Lin , Tai Wang , Jiangmiao Pang

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Zhongang Cai , Yubo Wang , Qingping Sun , Ruisi Wang , Chenyang Gu , Wanqi Yin , Zhiqian Lin , Zhitao Yang , Chen Wei , Oscar Qian , Hui En Pang , Xuanke Shi , Kewang Deng , Xiaoyang Han , Zukai Chen , Jiaqi Li , Xiangyu Fan , Hanming Deng , Lewei Lu , Bo Li , Ziwei Liu , Quan Wang , Dahua Lin , Lei Yang

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Unified multimodal models integrate the reasoning capacity of large language models with both image understanding and generation, showing great promise for advanced multimodal intelligence. However, the community still lacks a rigorous…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Hongxiang Li , Yaowei Li , Bin Lin , Yuwei Niu , Yuhang Yang , Xiaoshuang Huang , Jiayin Cai , Xiaolong Jiang , Yao Hu , Long Chen

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical…

Artificial Intelligence · Computer Science 2026-01-15 Jingxuan Wei , Caijun Jia , Xi Bai , Xinglong Xu , Siyuan Li , Linzhuang Sun , Bihui Yu , Conghui He , Lijun Wu , Cheng Tan

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks…

Artificial Intelligence · Computer Science 2026-05-08 Peiran Xu , Sudong Wang , Yao Zhu , Jianing Li , Gege Qi , Yunjian Zhang

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

Reasoning about dynamic spatial relationships is essential, as both observers and objects often move simultaneously. Although vision-language models (VLMs) and visual expertise models excel in 2D tasks and static scenarios, their ability to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Ziang Zhang , Zehan Wang , Guanghao Zhang , Weilong Dai , Yan Xia , Ziang Yan , Minjie Hong , Zhou Zhao

Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds

Spatial intelligence is crucial for vision--language models (VLMs) in the physical world, yet many benchmarks evaluate largely unconstrained scenes where models can exploit 2D shortcuts. We introduce SSI-Bench, a VQA benchmark for spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Chen Yang , Guanxin Lin , Youquan He , Peiyao Chen , Guanghe Liu , Yufan Mo , Zhouyuan Xu , Linhao Wang , Guohui Zhang , Zihang Zhang , Shenxiang Zeng , Chen Wang , Jiansheng Fan

Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods

Spatial reasoning, which requires ability to perceive and manipulate spatial relationships in the 3D world, is a fundamental aspect of human intelligence, yet remains a persistent challenge for Multimodal large language models (MLLMs).…

Artificial Intelligence · Computer Science 2025-11-21 Weichen Liu , Qiyao Xue , Haoming Wang , Xiangyu Yin , Boyuan Yang , Wei Gao

Generative Score Inference for Multimodal Data

Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable…

Machine Learning · Statistics 2026-03-30 Xinyu Tian , Xiaotong Shen

Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study

How to integrate and verify spatial intelligence in foundation models remains an open challenge. Current practice often proxies Visual-Spatial Intelligence (VSI) with purely textual prompts and VQA-style scoring, which obscures geometry,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Guanlin Wu , Boyan Su , Yang Zhao , Pu Wang , Yichen Lin , Hao Frank Yang

Blueprint-Bench: Comparing spatial intelligence of LLMs, agents and image models

We introduce Blueprint-Bench, a benchmark designed to evaluate spatial reasoning capabilities in AI models through the task of converting apartment photographs into accurate 2D floor plans. While the input modality (photographs) is well…

Artificial Intelligence · Computer Science 2025-10-01 Lukas Petersson , Axel Backlund , Axel Wennstöm , Hanna Petersson , Callum Sharrock , Arash Dabiri

GenSpace: Benchmarking Spatially-Aware Image Generation

Humans can intuitively compose and arrange scenes in the 3D space for photography. However, can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts? We present GenSpace,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Zehan Wang , Jiayang Xu , Ziang Zhang , Tianyu Pang , Chao Du , Hengshuang Zhao , Zhou Zhao

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning

Spatial intelligence, which refers to the ability to reason about geometric and physical structure from visual observations, remains a core challenge for multimodal large language models. Despite promising performance, recent multimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Yian Li , Yang Jiao , Bin Zhu , Tianwen Qian , Shaoxiang Chen , Jingjing Chen , Yu-Gang Jiang

Scaling Spatial Intelligence with Multimodal Foundation Models

Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to cultivate spatial intelligence within the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Zhongang Cai , Ruisi Wang , Chenyang Gu , Fanyi Pu , Junxiang Xu , Yubo Wang , Wanqi Yin , Zhitao Yang , Chen Wei , Qingping Sun , Tongxi Zhou , Jiaqi Li , Hui En Pang , Oscar Qian , Yukun Wei , Zhiqian Lin , Xuanke Shi , Kewang Deng , Xiaoyang Han , Zukai Chen , Xiangyu Fan , Hanming Deng , Lewei Lu , Liang Pan , Bo Li , Ziwei Liu , Quan Wang , Dahua Lin , Lei Yang

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

The use of Multimodal Large Language Models (MLLMs) as an end-to-end solution for Embodied AI and Autonomous Driving has become a prevailing trend. While MLLMs have been extensively studied for visual semantic understanding tasks, their…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Yun Li , Yiming Zhang , Tao Lin , Xiangrui Liu , Wenxiao Cai , Zheng Liu , Bo Zhao

SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks

Large Language Models (LLMs) have undergone rapid progress, largely attributed to reinforcement learning on complex reasoning tasks. In contrast, while spatial intelligence is fundamental for Vision-Language Models (VLMs) in real-world…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Zijian Song , Xiaoxin Lin , Qiuming Huang , Sihan Qin , Guangrun Wang , Liang Lin

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundation models excel in single-modal…

Computer Vision and Pattern Recognition · Computer Science 2026-01-09 Song Wang , Lingdong Kong , Xiaolu Liu , Hao Shi , Wentong Li , Jianke Zhu , Steven C. H. Hoi

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Humans possess spatial reasoning abilities that enable them to understand spaces through multimodal observations, such as vision and sound. Large multimodal reasoning models extend these abilities by learning to perceive and reason, showing…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xu Zheng , Zihao Dongfang , Lutao Jiang , Boyuan Zheng , Yulong Guo , Zhenquan Zhang , Giuliano Albanese , Runyi Yang , Mengjiao Ma , Zixin Zhang , Chenfei Liao , Dingcheng Zhen , Yuanhuiyi Lyu , Yuqian Fu , Bin Ren , Linfeng Zhang , Danda Pani Paudel , Nicu Sebe , Luc Van Gool , Xuming Hu

AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality

Spatial intelligence is important in Architecture, Construction, Science, Technology, Engineering, and Mathematics (STEM), and Medicine. Understanding three-dimensional (3D) spatial rotations can involve verbal descriptions and visual or…

Artificial Intelligence · Computer Science 2025-03-18 Uttamasha Monjoree , Wei Yan

Simulating the Real World: A Unified Survey of Multimodal Generative Models

Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research. To achieve this, many existing approaches, such as world models, aim to capture the fundamental principles governing the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Yuqi Hu , Longguang Wang , Xian Liu , Ling-Hao Chen , Yuwei Guo , Yukai Shi , Ce Liu , Anyi Rao , Zeyu Wang , Hui Xiong