Related papers: Grounding Language Attributes to Objects using Bay…

A Joint Model of Language and Perception for Grounded Attribute Learning

As robots become more ubiquitous and capable, it becomes ever more important to enable untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract…

Computation and Language · Computer Science 2012-07-03 Cynthia Matuszek , Nicholas FitzGerald , Luke Zettlemoyer , Liefeng Bo , Dieter Fox

ShapeGlot: Learning Language for Shape Differentiation

In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on images and 3D models of the objects. We first build a large scale, carefully controlled dataset of human…

Computation and Language · Computer Science 2019-05-09 Panos Achlioptas , Judy Fan , Robert X. D. Hawkins , Noah D. Goodman , Leonidas J. Guibas

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is…

Robotics · Computer Science 2017-07-19 Mohit Shridhar , David Hsu

Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following

We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects. We introduce a few-shot language-conditioned object grounding method trained from augmented…

Robotics · Computer Science 2020-11-17 Valts Blukis , Ross A. Knepper , Yoav Artzi

LanguageRefer: Spatial-Language Model for 3D Visual Grounding

For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to identify common objects in real-world 3D scenes. In this paper,…

Robotics · Computer Science 2021-11-08 Junha Roh , Karthik Desingh , Ali Farhadi , Dieter Fox

Reconstructing and grounding narrated instructional videos in 3D

Narrated instructional videos often show and describe manipulations of similar objects, e.g., repairing a particular model of a car or laptop. In this work we aim to reconstruct such objects and to localize associated narrations in 3D.…

Computer Vision and Pattern Recognition · Computer Science 2021-09-13 Dimitri Zhukov , Ignacio Rocco , Ivan Laptev , Josef Sivic , Johannes L. Schönberger , Bugra Tekin , Marc Pollefeys

Language Grounding with 3D Objects

Seemingly simple natural language requests to a robot are generally underspecified, for example "Can you bring me the wireless mouse?" Flat images of candidate mice may not provide the discriminative information needed for "wireless." The…

Computation and Language · Computer Science 2021-09-16 Jesse Thomason , Mohit Shridhar , Yonatan Bisk , Chris Paxton , Luke Zettlemoyer

Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement

We present a new method, PARsing And visual GrOuNding (ParaGon), for grounding natural language in object placement tasks. Natural language generally describes objects and spatial relations with compositionality and ambiguity, two major…

Robotics · Computer Science 2023-03-14 Zirui Zhao , Wee Sun Lee , David Hsu

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning

Grounded understanding of natural language in physical scenes can greatly benefit robots that follow human instructions. In object manipulation scenarios, existing end-to-end models are proficient at understanding semantic concepts, but…

Robotics · Computer Science 2023-04-03 Qian Luo , Yunfei Li , Yi Wu

An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection

Recent advances in open-vocabulary object detection models will enable Automatic Target Recognition systems to be sustainable and repurposed by non-technical end-users for a variety of applications or missions. New, and potentially nuanced,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-24 Louis Y. Kim , Michelle Karker , Victoria Valledor , Seiyoung C. Lee , Karl F. Brzoska , Margaret Duff , Anthony Palladino

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

This paper presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input…

Robotics · Computer Science 2018-06-12 Mohit Shridhar , David Hsu

Zero-Shot Grounding of Objects from Natural Language Queries

A phrase grounding system localizes a particular object in an image referred to by a natural language query. In previous work, the phrases were restricted to have nouns that were encountered in training, we extend the task to Zero-Shot…

Computer Vision and Pattern Recognition · Computer Science 2019-08-21 Arka Sadhu , Kan Chen , Ram Nevatia

Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory

Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally…

Robotics · Computer Science 2024-02-07 Nicolas Harvey Chapman , Feras Dayoub , Will Browne , Chris Lehnert

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations. In particular, it is often crucial to distinguish similar objects referred by the text, such as "the left most chair"…

Computer Vision and Pattern Recognition · Computer Science 2022-11-18 Shizhe Chen , Pierre-Louis Guhur , Makarand Tapaswi , Cordelia Schmid , Ivan Laptev

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

Object Transfiguration replaces an object in an image with another object from a second image. For example it can perform tasks like "putting exactly those eyeglasses from image A on the nose of the person in image B". Usage of exemplar…

Computer Vision and Pattern Recognition · Computer Science 2017-05-16 Shuchang Zhou , Taihong Xiao , Yi Yang , Dieqiao Feng , Qinyao He , Weiran He

Video Object Segmentation with Language Referring Expressions

Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and…

Computer Vision and Pattern Recognition · Computer Science 2019-02-06 Anna Khoreva , Anna Rohrbach , Bernt Schiele

Object Captioning and Retrieval with Natural Language

We address the problem of jointly learning vision and language to understand the object in a fine-grained manner. The key idea of our approach is the use of object descriptions to provide the detailed understanding of an object. Based on…

Computer Vision and Pattern Recognition · Computer Science 2018-03-19 Anh Nguyen , Thanh-Toan Do , Ian Reid , Darwin G. Caldwell , Nikos G. Tsagarakis

Grounding object perception in a naive agent's sensorimotor experience

Artificial object perception usually relies on a priori defined models and feature extraction algorithms. We study how the concept of object can be grounded in the sensorimotor experience of a naive agent. Without any knowledge about itself…

Robotics · Computer Science 2016-09-27 Alban Laflaquière , Nikolas Hemion

You Only Speak Once to See

Grounding objects in images using visual cues is a well-established approach in computer vision, yet the potential of audio as a modality for object recognition and grounding remains underexplored. We introduce YOSS, "You Only Speak Once to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Wenhao Yang , Jianguo Wei , Wenhuan Lu , Lei Li

Neural Variational Learning for Grounded Language Acquisition

We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms. We present a unified generative method to acquire a shared semantic/visual embedding that enables the learning…

Computation and Language · Computer Science 2021-08-02 Nisha Pillai , Cynthia Matuszek , Francis Ferraro