Related papers: Meta Compositional Referring Expression Segmentati…

GRES: Generalized Referring Expression Segmentation

Referring Expression Segmentation (RES) aims to generate a segmentation mask for the object described by a given language expression. Existing classic RES datasets and methods commonly support single-target expressions only, i.e., one…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Chang Liu , Henghui Ding , Xudong Jiang

Advancing Referring Expression Segmentation Beyond Single Image

Referring Expression Segmentation (RES) is a widely explored multi-modal task, which endeavors to segment the pre-existing object within a single image with a given linguistic expression. However, in broader real-world scenarios, it is not…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yixuan Wu , Zhao Zhang , Xie Chi , Feng Zhu , Rui Zhao

GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation

Referring Expression Segmentation (RES) and Comprehension (REC) respectively segment and detect the object described by an expression, while Referring Expression Generation (REG) generates an expression for the selected object. Existing…

Computer Vision and Pattern Recognition · Computer Science 2026-01-09 Henghui Ding , Chang Liu , Shuting He , Xudong Jiang , Yu-Gang Jiang

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. Different from the object detection task that queried object labels have been…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Yanyuan Qiao , Chaorui Deng , Qi Wu

Multimodal Referring Segmentation: A Survey

Multimodal referring segmentation aims to segment target objects in visual scenes, such as images, videos, and 3D scenes, based on referring expressions in text or audio format. This task plays a crucial role in practical applications…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Henghui Ding , Song Tang , Shuting He , Chang Liu , Zuxuan Wu , Yu-Gang Jiang

Referring Expression Object Segmentation with Caption-Aware Consistency

Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. In this work, we focus on segmenting the object in an image specified by a referring…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Yi-Wen Chen , Yi-Hsuan Tsai , Tiantian Wang , Yen-Yu Lin , Ming-Hsuan Yang

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring…

Computer Vision and Pattern Recognition · Computer Science 2020-03-03 Zhenfang Chen , Peng Wang , Lin Ma , Kwan-Yee K. Wong , Qi Wu

Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities

Referring expression segmentation (RES) aims at segmenting the entities' masks that match the descriptive language expression. While traditional RES methods primarily address object-level grounding, real-world scenarios demand a more…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Jing Liu , Wenxuan Wang , Yisi Zhang , Yepeng Tang , Xingjian He , Longteng Guo , Tongtian Yue , Xinlong Wang

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time,…

Computer Vision and Pattern Recognition · Computer Science 2021-06-10 Ioannis Kazakos , Carles Ventura , Miriam Bellver , Carina Silberer , Xavier Giro-i-Nieto

MMNet: Multi-Mask Network for Referring Image Segmentation

Referring image segmentation aims to segment an object referred to by natural language expression from an image. However, this task is challenging due to the distinct data properties between text and image, and the randomness introduced by…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Yichen Yan , Xingjian He , Wenxuan Wan , Jing Liu

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

Referring expression segmentation (RES) aims at segmenting the foreground masks of the entities that match the descriptive natural language expression. Previous datasets and methods for classic RES task heavily rely on the prior assumption…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Wenxuan Wang , Tongtian Yue , Yisi Zhang , Longteng Guo , Xingjian He , Xinlong Wang , Jing Liu

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the…

Computer Vision and Pattern Recognition · Computer Science 2016-08-31 Ronghang Hu , Marcus Rohrbach , Subhashini Venugopalan , Trevor Darrell

Referring Image Segmentation via Cross-Modal Progressive Comprehension

Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction…

Computer Vision and Pattern Recognition · Computer Science 2020-10-02 Shaofei Huang , Tianrui Hui , Si Liu , Guanbin Li , Yunchao Wei , Jizhong Han , Luoqi Liu , Bo Li

Latent Expression Generation for Referring Image Segmentation and Grounding

Visual grounding tasks, such as referring image segmentation (RIS) and referring expression comprehension (REC), aim to localize a target object based on a given textual description. The target object in an image can be described in…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Seonghoon Yu , Junbeom Hong , Joonseok Lee , Jeany Son

Cross-Modal Progressive Comprehension for Referring Segmentation

Given a natural language expression and an image/video, the goal of referring segmentation is to produce the pixel-level masks of the entities described by the subject of the expression. Previous approaches tackle this problem by implicit…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Si Liu , Tianrui Hui , Shaofei Huang , Yunchao Wei , Bo Li , Guanbin Li

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

We present REM, a framework for segmenting a wide range of concepts in video that can be described through natural language. Our method leverages the universal visual-language mapping learned by video diffusion models on Internet-scale data…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Anurag Bagchi , Zhipeng Bao , Yu-Xiong Wang , Pavel Tokmakov , Martial Hebert

SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data

Despite the advances in Referring Expression Segmentation (RES) benchmarks, their evaluation protocols remain constrained, primarily focusing on either single targets with short queries (containing minimal attributes) or multiple targets…

Machine Learning · Computer Science 2025-05-26 Dong-Hee Kim , Hyunjee Song , Donghyun Kim

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration

Referring Expression Comprehension (REC) is a foundational cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding. It serves as an essential testing ground for…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Xuzheng Yang , Junzhuo Liu , Peng Wang , Guoqing Wang , Yang Yang , Heng Tao Shen

Towards Omni-supervised Referring Expression Segmentation

Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Minglang Huang , Yiyi Zhou , Gen Luo , Guannan Jiang , Weilin Zhuang , Xiaoshuai Sun

3D-GRES: Generalized 3D Referring Expression Segmentation

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description. However, current approaches are limited to segmenting a single target, restricting the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Changli Wu , Yihang Liu , Jiayi Ji , Yiwei Ma , Haowei Wang , Gen Luo , Henghui Ding , Xiaoshuai Sun , Rongrong Ji