Related papers: Spatial Language Representation with Multi-Level G…

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does…

Computation and Language · Computer Science 2026-02-10 Abir Harrasse , Florent Draye , Punya Syon Pandey , Zhijing Jin , Bernhard Schölkopf

Probing Multimodal Large Language Models for Global and Local Semantic Representations

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving…

Computation and Language · Computer Science 2024-11-22 Mingxu Tao , Quzhe Huang , Kun Xu , Liwei Chen , Yansong Feng , Dongyan Zhao

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

Multimodal large language models (MLLMs) have achieved significant progress in image and language tasks due to the strong reasoning capability of large language models (LLMs). Nevertheless, most MLLMs suffer from limited spatial reasoning…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Jiajie Guo , Qingpeng Zhu , Jin Zeng , Xiaolong Wu , Changyong He , Weida Wang

MSEarth: A Multimodal Benchmark for Earth Science Phenomenon Discovery with MLLMs

The rapid advancement of multimodal large language models (MLLMs) offers new opportunities for complex scientific challenges, yet their application in earth science-especially at the graduate level-remains underexplored due to a lack of…

Artificial Intelligence · Computer Science 2026-05-05 Xiangyu Zhao , Wanghan Xu , Bo Liu , Yuhao Zhou , Fenghua Ling , Ben Fei , Xiaoyu Yue , Lei Bai , Wenlong Zhang , Xiao-Ming Wu

Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs

Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge. Mobile robots rely on map comprehension for tasks like localization and navigation. In this paper, we explore…

Robotics · Computer Science 2024-10-25 Fujing Xie , Sören Schwertfeger

GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models…

Computation and Language · Computer Science 2023-10-24 Zekun Li , Wenxuan Zhou , Yao-Yi Chiang , Muhao Chen

The Geometry of Multilingual Language Model Representations

We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear…

Computation and Language · Computer Science 2022-10-25 Tyler A. Chang , Zhuowen Tu , Benjamin K. Bergen

Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells

Unsupervised text encoding models have recently fueled substantial progress in NLP. The key idea is to use neural networks to convert words in texts to vector space representations based on word positions in a sentence and their contexts,…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Gengchen Mai , Krzysztof Janowicz , Bo Yan , Rui Zhu , Ling Cai , Ni Lao

Grounding Everything in Tokens for Multimodal Large Language Models

Multimodal large language models (MLLMs) have made significant advancements in vision understanding and reasoning. However, the autoregressive Transformer architecture used by MLLMs requries tokenization on input images, which limits their…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Xiangxuan Ren , Zhongdao Wang , Liping Hou , Pin Tang , Guoqing Wang , Chao Ma

Multiview Scene Graph

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Juexiao Zhang , Gao Zhu , Sihang Li , Xinhao Liu , Haorui Song , Xinran Tang , Chen Feng

Graph Fusion Across Languages using Large Language Models

Combining multiple knowledge graphs (KGs) across linguistic boundaries is a persistent challenge due to semantic heterogeneity and the complexity of graph environments. We propose a framework for cross-lingual graph fusion, leveraging the…

Computation and Language · Computer Science 2026-03-24 Kaung Myat Kyaw , Khush Agarwal , Jonathan Chan

MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization

Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and…

Robotics · Computer Science 2025-12-16 Gihyeon Lee , Jungwoo Lee , Juwon Kim , Young-Sik Shin , Younggun Cho

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Yichi Zhang , Ziqiao Ma , Xiaofeng Gao , Suhaila Shakiah , Qiaozi Gao , Joyce Chai

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Hui Xiao , Yuting Hong , Li Dong , Diqun Yan , Jiayan Zhuang , Junjie Xiong , Dongtai Liang , Chengbin Peng

Geometric Scene Parsing with Hierarchical LSTM

This paper addresses the problem of geometric scene parsing, i.e. simultaneously labeling geometric surfaces (e.g. sky, ground and vertical plane) and determining the interaction relations (e.g. layering, supporting, siding and affinity)…

Computer Vision and Pattern Recognition · Computer Science 2016-04-11 Zhanglin Peng , Ruimao Zhang , Xiaodan Liang , Xiaobai Liu , Liang Lin

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and…

Computation and Language · Computer Science 2023-07-10 Yuhan Ji , Song Gao

Navigate Complex Physical Worlds via Geometrically Constrained LLM

This study investigates the potential of Large Language Models (LLMs) for reconstructing and constructing the physical world solely based on textual knowledge. It explores the impact of model performance on spatial understanding abilities.…

Computation and Language · Computer Science 2024-10-24 Yongqiang Huang , Wentao Ye , Liyao Li , Junbo Zhao

The Dual Mechanisms of Spatial Reasoning in Vision-Language Models

Many multimodal tasks, such as image captioning and visual question answering, require vision-language models (VLMs) to associate objects with their properties and spatial relations. Yet it remains unclear where and how such associations…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Kelly Cui , Nikhil Prakash , Ayush Raina , David Bau , Antonio Torralba , Tamar Rott Shaham

Coordinates from Context: Using LLMs to Ground Complex Location References

Geocoding is the task of linking a location reference to an actual geographic location and is essential for many downstream analyses of unstructured text. In this paper, we explore the challenging setting of geocoding compositional location…

Computation and Language · Computer Science 2026-01-27 Tessa Masis , Brendan O'Connor

SurveyG: A Multi-Agent LLM Framework with Hierarchical Citation Graph for Automated Survey Generation

Large language models (LLMs) are increasingly adopted for automating survey paper generation \cite{wang2406autosurvey, liang2025surveyx, yan2025surveyforge,su2025benchmarking,wen2025interactivesurvey}. Existing approaches typically extract…

Artificial Intelligence · Computer Science 2026-02-10 Minh-Anh Nguye , Minh-Duc Nguyen , Ha Lan N. T. , Kieu Hai Dang , Nguyen Tien Dong , Dung D. Le