Related papers: Incremental Object Grounding Using Scene Graphs

Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing

This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints given in a scene graph. A typical natural scene contains several objects, often exhibiting visual relationships of varied…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Aditay Tripathi , Anand Mishra , Anirban Chakraborty

Image Semantic Relation Generation

Scene graphs provide structured semantic understanding beyond images. For downstream tasks, such as image retrieval, visual question answering, visual relationship detection, and even autonomous vehicle technology, scene graphs can not only…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Mingzhe Du

Relational Scene Graphs for Object Grounding of Natural Language Commands

Robots are finding wider adoption in human environments, increasing the need for natural human-robot interaction. However, understanding a natural language command requires the robot to infer the intended task and how to decompose it into…

Robotics · Computer Science 2026-02-05 Julia Kuhn , Francesco Verdoja , Tsvetomila Mihaylova , Ville Kyrki

Attribute-based Object Grounding and Robot Grasp Detection with Spatial Reasoning

Enabling robots to grasp objects specified through natural language is essential for effective human-robot interaction, yet it remains a significant challenge. Existing approaches often struggle with open-form language expressions and…

Robotics · Computer Science 2025-09-11 Houjian Yu , Zheming Zhou , Min Sun , Omid Ghasemalizadeh , Yuyin Sun , Cheng-Hao Kuo , Arnie Sen , Changhyun Choi

Scene Graph Reasoning for Visual Question Answering

Visual question answering is concerned with answering free-form questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image,…

Machine Learning · Computer Science 2020-07-03 Marcel Hildebrandt , Hang Li , Rajat Koner , Volker Tresp , Stephan Günnemann

Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning

Traditional scene graphs primarily focus on spatial relationships, limiting vision-language models' (VLMs) ability to reason about complex interactions in visual scenes. This paper addresses two key challenges: (1) conventional…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Dayong Liang , Changmeng Zheng , Zhiyuan Wen , Yi Cai , Xiao-Yong Wei , Qing Li

Graph-Structured Referring Expression Reasoning in The Wild

Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often…

Computer Vision and Pattern Recognition · Computer Science 2020-04-21 Sibei Yang , Guanbin Li , Yizhou Yu

ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation

3D scene graphs have empowered robots with semantic understanding for navigation and planning. However, current functional scene graphs primarily focus on static element detection, lacking the actionable kinematic information required for…

Robotics · Computer Science 2026-03-24 Qiuyi Gu , Yuze Sheng , Jincheng Yu , Jiahao Tang , Xiaolong Shan , Zhaoyang Shen , Tinghao Yi , Xiaodan Liang , Xinlei Chen , Yu Wang

Scene Graph Modification as Incremental Structure Expanding

A scene graph is a semantic representation that expresses the objects, attributes, and relationships between objects in a scene. Scene graphs play an important role in many cross modality tasks, as they are able to capture the interactions…

Computer Vision and Pattern Recognition · Computer Science 2022-09-20 Xuming Hu , Zhijiang Guo , Yu Fu , Lijie Wen , Philip S. Yu

SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding

Data augmentation is an essential technique in improving the generalization of deep neural networks. The majority of existing image-domain augmentations either rely on geometric and structural transformations, or apply different kinds of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Morgan Heisler , Amin Banitalebi-Dehkordi , Yong Zhang

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object…

Robotics · Computer Science 2023-09-29 Haonan Chang , Kowndinya Boyalakuntla , Shiyang Lu , Siwei Cai , Eric Jing , Shreesh Keskar , Shijie Geng , Adeeb Abbas , Lifeng Zhou , Kostas Bekris , Abdeslam Boularias

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is…

Robotics · Computer Science 2017-07-19 Mohit Shridhar , David Hsu

LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding

3D visual grounding aims to localize the unique target described by natural languages in 3D scenes. The significant gap between 3D and language modalities makes it a notable challenge to distinguish multiple similar objects through the…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Feng Xiao , Hongbin Xu , Guocan Zhao , Wenxiong Kang

Situational Scene Graph for Structured Human-centric Situation Understanding

Graph based representation has been widely used in modelling spatio-temporal relationships in video understanding. Although effective, existing graph-based approaches focus on capturing the human-object relationships while ignoring…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Chinthani Sugandhika , Chen Li , Deepu Rajan , Basura Fernando

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

This paper presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input…

Robotics · Computer Science 2018-06-12 Mohit Shridhar , David Hsu

VSGM -- Enhance robot task understanding ability through visual semantic graph

In recent years, developing AI for robotics has raised much attention. The interaction of vision and language of robots is particularly difficult. We consider that giving robots an understanding of visual semantics and language semantics…

Robotics · Computer Science 2021-05-26 Cheng Yu Tsai , Mu-Chun Su

Open Scene Graphs for Open World Object-Goal Navigation

How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation…

Robotics · Computer Science 2024-07-03 Joel Loo , Zhanxin Wu , David Hsu

Grounding Symbols in Multi-Modal Instructions

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must…

Artificial Intelligence · Computer Science 2017-06-02 Yordan Hristov , Svetlin Penkov , Alex Lascarides , Subramanian Ramamoorthy

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

Visual grounding aims to localize the object referred to in an image based on a natural language query. Although progress has been made recently, accurately localizing target objects within multiple-instance distractions (multiple objects…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Minghang Zheng , Jiahua Zhang , Qingchao Chen , Yuxin Peng , Yang Liu

Robust Graph Matching through Semantic Relationship Generation for SLAM

Graph-based representations such as Scene Graphs enable localization in structured indoor environments by matching a locally observed graph, constructed from sensor data, to a prior map. This process is particularly challenging in…

Robotics · Computer Science 2026-04-29 David Perez-Saura , Jose Andres Millan-Romera , Miguel Fernandez-Cortizas , Holger Voos , Pascual Campoy , Jose Luis Sanchez-Lopez