Related papers: Modeling Context Between Objects for Referring Exp…

Modeling Context in Referring Expressions

Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on…

Computer Vision and Pattern Recognition · Computer Science 2016-08-11 Licheng Yu , Patrick Poirson , Shan Yang , Alexander C. Berg , Tamara L. Berg

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring…

Robotics · Computer Science 2021-04-20 Fethiye Irmak Doğan , Sinan Kalkan , Iolanda Leite

Referring Expression Object Segmentation with Caption-Aware Consistency

Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. In this work, we focus on segmenting the object in an image specified by a referring…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Yi-Wen Chen , Yi-Hsuan Tsai , Tiantian Wang , Yen-Yu Lin , Ming-Hsuan Yang

Understanding Synonymous Referring Expressions via Contrastive Features

Referring expression comprehension aims to localize objects identified by natural language descriptions. This is a challenging task as it requires understanding of both visual and language domains. One nature is that each object can be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-21 Yi-Wen Chen , Yi-Hsuan Tsai , Ming-Hsuan Yang

Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition

Context, as referred to situational factors related to the object of interest, can help infer the object's states or properties in visual recognition. As such contextual features are too diverse (across instances) to be annotated, existing…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Mingzhou Liu , Xinwei Sun , Fandong Zhang , Yizhou Yu , Yizhou Wang

Searching for Ambiguous Objects in Videos using Relational Referring Expressions

Humans frequently use referring (identifying) expressions to refer to objects. Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing,…

Computer Vision and Pattern Recognition · Computer Science 2019-08-21 Hazan Anayurt , Sezai Artun Ozyegin , Ulfet Cetin , Utku Aktas , Sinan Kalkan

Context-Aware Temporal Embedding of Objects in Video Data

In video analysis, understanding the temporal context is crucial for recognizing object interactions, event patterns, and contextual changes over time. The proposed model leverages adjacency and semantic similarities between objects from…

Computer Vision and Pattern Recognition · Computer Science 2024-08-26 Ahnaf Farhan , M. Shahriar Hossain

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., ``largest elephant standing behind baby elephant''. This is a general yet challenging vision-language task since it does not only require the…

Computer Vision and Pattern Recognition · Computer Science 2019-07-09 Yulei Niu , Hanwang Zhang , Zhiwu Lu , Shih-Fu Chang

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. Different from the object detection task that queried object labels have been…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Yanyuan Qiao , Chaorui Deng , Qi Wu

Visual Referring Expression Recognition: What Do Systems Actually Learn?

We present an empirical analysis of the state-of-the-art systems for referring expression recognition -- the task of identifying the object in an image referred to by a natural language expression -- with the goal of gaining insight into…

Computation and Language · Computer Science 2018-05-31 Volkan Cirik , Louis-Philippe Morency , Taylor Berg-Kirkpatrick

Exploring Person Context and Local Scene Context for Object Detection

In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object…

Computer Vision and Pattern Recognition · Computer Science 2015-11-26 Saurabh Gupta , Bharath Hariharan , Jitendra Malik

Detecting Objects with Context-Likelihood Graphs and Graph Refinement

The goal of this paper is to detect objects by exploiting their interrelationships. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly. We first…

Computer Vision and Pattern Recognition · Computer Science 2023-09-28 Aritra Bhowmik , Yu Wang , Nora Baka , Martin R. Oswald , Cees G. M. Snoek

Recurrent Multimodal Interaction for Referring Image Segmentation

In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment…

Computer Vision and Pattern Recognition · Computer Science 2017-08-08 Chenxi Liu , Zhe Lin , Xiaohui Shen , Jimei Yang , Xin Lu , Alan Yuille

Using Depth for Improving Referring Expression Comprehension in Real-World Environments

In a human-robot collaborative task where a robot helps its partner by finding described objects, the depth dimension plays a critical role in successful task completion. Existing studies have mostly focused on comprehending the object…

Robotics · Computer Science 2021-07-13 Fethiye Irmak Dogan , Iolanda Leite

Resilience through Scene Context in Visual Referring Expression Generation

Scene context is well known to facilitate humans' perception of visible objects. In this paper, we investigate the role of context in Referring Expression Generation (REG) for objects in images, where existing research has often focused on…

Computation and Language · Computer Science 2024-08-26 Simeon Junker , Sina Zarrieß

Generation and Comprehension of Unambiguous Object Descriptions

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being…

Computer Vision and Pattern Recognition · Computer Science 2016-04-12 Junhua Mao , Jonathan Huang , Alexander Toshev , Oana Camburu , Alan Yuille , Kevin Murphy

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Zhenfang Chen , Peng Wang , Lin Ma , Kwan-Yee K. Wong , Qi Wu

Grounding Referring Expressions in Images by Variational Context

We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., "largest elephant standing behind baby elephant". This is a general yet challenging vision-language task since it does not only require the…

Computer Vision and Pattern Recognition · Computer Science 2019-07-03 Hanwang Zhang , Yulei Niu , Shih-Fu Chang

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the…

Computer Vision and Pattern Recognition · Computer Science 2016-08-31 Ronghang Hu , Marcus Rohrbach , Subhashini Venugopalan , Trevor Darrell

Generating Easy-to-Understand Referring Expressions for Target Identifications

This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. As a target becomes relatively less salient, identifying referred objects itself becomes more…

Computer Vision and Pattern Recognition · Computer Science 2019-08-30 Mikihiro Tanaka , Takayuki Itamochi , Kenichi Narioka , Ikuro Sato , Yoshitaka Ushiku , Tatsuya Harada