English
Related papers

Related papers: Spatial Language Representation with Multi-Level G…

200 papers

Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does…

Computation and Language · Computer Science 2026-02-10 Abir Harrasse , Florent Draye , Punya Syon Pandey , Zhijing Jin , Bernhard Schölkopf

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving…

Computation and Language · Computer Science 2024-11-22 Mingxu Tao , Quzhe Huang , Kun Xu , Liwei Chen , Yansong Feng , Dongyan Zhao

Multimodal large language models (MLLMs) have achieved significant progress in image and language tasks due to the strong reasoning capability of large language models (LLMs). Nevertheless, most MLLMs suffer from limited spatial reasoning…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Jiajie Guo , Qingpeng Zhu , Jin Zeng , Xiaolong Wu , Changyong He , Weida Wang

The rapid advancement of multimodal large language models (MLLMs) offers new opportunities for complex scientific challenges, yet their application in earth science-especially at the graduate level-remains underexplored due to a lack of…

Artificial Intelligence · Computer Science 2026-05-05 Xiangyu Zhao , Wanghan Xu , Bo Liu , Yuhao Zhou , Fenghua Ling , Ben Fei , Xiaoyu Yue , Lei Bai , Wenlong Zhang , Xiao-Ming Wu

Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge. Mobile robots rely on map comprehension for tasks like localization and navigation. In this paper, we explore…

Robotics · Computer Science 2024-10-25 Fujing Xie , Sören Schwertfeger

Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models…

Computation and Language · Computer Science 2023-10-24 Zekun Li , Wenxuan Zhou , Yao-Yi Chiang , Muhao Chen

We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear…

Computation and Language · Computer Science 2022-10-25 Tyler A. Chang , Zhuowen Tu , Benjamin K. Bergen

Unsupervised text encoding models have recently fueled substantial progress in NLP. The key idea is to use neural networks to convert words in texts to vector space representations based on word positions in a sentence and their contexts,…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Gengchen Mai , Krzysztof Janowicz , Bo Yan , Rui Zhu , Ling Cai , Ni Lao

Multimodal large language models (MLLMs) have made significant advancements in vision understanding and reasoning. However, the autoregressive Transformer architecture used by MLLMs requries tokenization on input images, which limits their…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Xiangxuan Ren , Zhongdao Wang , Liping Hou , Pin Tang , Guoqing Wang , Chao Ma

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Juexiao Zhang , Gao Zhu , Sihang Li , Xinhao Liu , Haorui Song , Xinran Tang , Chen Feng

Combining multiple knowledge graphs (KGs) across linguistic boundaries is a persistent challenge due to semantic heterogeneity and the complexity of graph environments. We propose a framework for cross-lingual graph fusion, leveraging the…

Computation and Language · Computer Science 2026-03-24 Kaung Myat Kyaw , Khush Agarwal , Jonathan Chan

Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and…

Robotics · Computer Science 2025-12-16 Gihyeon Lee , Jungwoo Lee , Juwon Kim , Young-Sik Shin , Younggun Cho

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Yichi Zhang , Ziqiao Ma , Xiaofeng Gao , Suhaila Shakiah , Qiaozi Gao , Joyce Chai

Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Hui Xiao , Yuting Hong , Li Dong , Diqun Yan , Jiayan Zhuang , Junjie Xiong , Dongtai Liang , Chengbin Peng

This paper addresses the problem of geometric scene parsing, i.e. simultaneously labeling geometric surfaces (e.g. sky, ground and vertical plane) and determining the interaction relations (e.g. layering, supporting, siding and affinity)…

Computer Vision and Pattern Recognition · Computer Science 2016-04-11 Zhanglin Peng , Ruimao Zhang , Xiaodan Liang , Xiaobai Liu , Liang Lin

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and…

Computation and Language · Computer Science 2023-07-10 Yuhan Ji , Song Gao

This study investigates the potential of Large Language Models (LLMs) for reconstructing and constructing the physical world solely based on textual knowledge. It explores the impact of model performance on spatial understanding abilities.…

Computation and Language · Computer Science 2024-10-24 Yongqiang Huang , Wentao Ye , Liyao Li , Junbo Zhao

Many multimodal tasks, such as image captioning and visual question answering, require vision-language models (VLMs) to associate objects with their properties and spatial relations. Yet it remains unclear where and how such associations…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Kelly Cui , Nikhil Prakash , Ayush Raina , David Bau , Antonio Torralba , Tamar Rott Shaham

Geocoding is the task of linking a location reference to an actual geographic location and is essential for many downstream analyses of unstructured text. In this paper, we explore the challenging setting of geocoding compositional location…

Computation and Language · Computer Science 2026-01-27 Tessa Masis , Brendan O'Connor

Large language models (LLMs) are increasingly adopted for automating survey paper generation \cite{wang2406autosurvey, liang2025surveyx, yan2025surveyforge,su2025benchmarking,wen2025interactivesurvey}. Existing approaches typically extract…

Artificial Intelligence · Computer Science 2026-02-10 Minh-Anh Nguye , Minh-Duc Nguyen , Ha Lan N. T. , Kieu Hai Dang , Nguyen Tien Dong , Dung D. Le
‹ Prev 1 2 3 10 Next ›