English
Related papers

Related papers: Language Grounding with 3D Objects

200 papers

The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is…

Robotics · Computer Science 2017-07-19 Mohit Shridhar , David Hsu

For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to identify common objects in real-world 3D scenes. In this paper,…

Robotics · Computer Science 2021-11-08 Junha Roh , Karthik Desingh , Ali Farhadi , Dieter Fox

In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on images and 3D models of the objects. We first build a large scale, carefully controlled dataset of human…

Computation and Language · Computer Science 2019-05-09 Panos Achlioptas , Judy Fan , Robert X. D. Hawkins , Noah D. Goodman , Leonidas J. Guibas

Grounded understanding of natural language in physical scenes can greatly benefit robots that follow human instructions. In object manipulation scenarios, existing end-to-end models are proficient at understanding semantic concepts, but…

Robotics · Computer Science 2023-04-03 Qian Luo , Yunfei Li , Yi Wu

Localizing 3D objects using natural language is essential for robotic scene understanding. The descriptions often involve multiple spatial relationships to distinguish similar objects, making 3D-language alignment difficult. Current methods…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Feng Xiao , Hongbin Xu , Hai Ci , Wenxiong Kang

Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos. It essentially requires semantic comprehension and fine-grained video understanding. Existing methods introduce language…

Computer Vision and Pattern Recognition · Computer Science 2024-01-22 Chen Liang , Yu Wu , Yawei Luo , Yi Yang

We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique object given a text description. However, such a…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Yiming Zhang , ZeMing Gong , Angel X. Chang

We develop a system to disambiguate object instances within the same class based on simple physical descriptions. The system takes as input a natural language phrase and a depth image containing a segmented object and predicts how similar…

Robotics · Computer Science 2019-08-05 Vanya Cohen , Benjamin Burchfiel , Thao Nguyen , Nakul Gopalan , Stefanie Tellex , George Konidaris

Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual…

Robotics · Computer Science 2020-06-25 Thao Nguyen , Nakul Gopalan , Roma Patel , Matt Corsaro , Ellie Pavlick , Stefanie Tellex

Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations. In particular, it is often crucial to distinguish similar objects referred by the text, such as "the left most chair"…

Computer Vision and Pattern Recognition · Computer Science 2022-11-18 Shizhe Chen , Pierre-Louis Guhur , Makarand Tapaswi , Cordelia Schmid , Ivan Laptev

When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an…

Computation and Language · Computer Science 2024-04-11 Chancharik Mitra , Abrar Anwar , Rodolfo Corona , Dan Klein , Trevor Darrell , Jesse Thomason

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must…

Artificial Intelligence · Computer Science 2017-06-02 Yordan Hristov , Svetlin Penkov , Alex Lascarides , Subramanian Ramamoorthy

Robots are finding wider adoption in human environments, increasing the need for natural human-robot interaction. However, understanding a natural language command requires the robot to infer the intended task and how to decompose it into…

Robotics · Computer Science 2026-02-05 Julia Kuhn , Francesco Verdoja , Tsvetomila Mihaylova , Ville Kyrki

Semantic maps allow a robot to reason about its surroundings to fulfill tasks such as navigating known environments, finding specific objects, and exploring unmapped areas. Traditional mapping approaches provide accurate geometric…

Robotics · Computer Science 2026-02-03 Felix Igelbrink , Lennart Niecksch , Marian Renz , Martin Günther , Martin Atzmueller

The contribution of this paper is to provide a semantic model (using soft constraints) of the words used by web-users to describe objects in a language game; a game in which one user describes a selected object of those composing the scene,…

Computation and Language · Computer Science 2010-05-31 Sergio Guadarrama , David P. Pancho

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language. Overcoming this challenge requires the ability…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Yuankai Qi , Qi Wu , Peter Anderson , Xin Wang , William Yang Wang , Chunhua Shen , Anton van den Hengel

3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language. With a wide range of applications ranging from autonomous indoor robotics to AR/VR, the task has recently risen…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Ozan Unal , Christos Sakaridis , Suman Saha , Luc Van Gool

As robots become more ubiquitous and capable, it becomes ever more important to enable untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract…

Computation and Language · Computer Science 2012-07-03 Cynthia Matuszek , Nicholas FitzGerald , Luke Zettlemoyer , Liefeng Bo , Dieter Fox

The integration of language and 3D perception is critical for embodied AI and robotic systems to perceive, understand, and interact with the physical world. Spatial reasoning, a key capability for understanding spatial relationships between…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Jiaxin Huang , Ziwen Li , Hanlve Zhang , Runnan Chen , Xiao He , Yandong Guo , Wenping Wang , Tongliang Liu , Mingming Gong

3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query. Although many approaches have been proposed and achieved impressive performance, they all require dense object-sentence pair…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Zehan Wang , Haifeng Huang , Yang Zhao , Linjun Li , Xize Cheng , Yichen Zhu , Aoxiong Yin , Zhou Zhao
‹ Prev 1 2 3 10 Next ›