English
Related papers

Related papers: Learning Context-Conditioned Predicate Semantics v…

200 papers

Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset long-tail problem. Recently, various unbiased strategies have been proposed by designing novel loss functions and data balancing…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Xiaoguang Chang , Teng Wang , Shaowei Cai , Changyin Sun

The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, in SGG benchmark datasets, each subject-object pair is annotated with…

Computer Vision and Pattern Recognition · Computer Science 2024-07-26 Jaehyeong Jeon , Kibum Kim , Kanghoon Yoon , Chanyoung Park

Scene Graph Generation (SGG) provides basic language representation of visual scenes, requiring models to grasp complex and diverse semantics between objects. This complexity and diversity in SGG leads to underrepresentation, where parts of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-30 Yuxuan Wang , Xiaoyuan Liu

Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This reasoning unfaithfulness frequently manifests as…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Chuhan Wang , Xintong Li , Jennifer Yuntong Zhang , Junda Wu , Chengkai Huang , Lina Yao , Julian McAuley , Jingbo Shang

The scene graph generation (SGG) task aims to detect visual relationship triplets, i.e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding. However, current models are stuck in common…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Yuyu Guo , Lianli Gao , Xuanhan Wang , Yuxuan Hu , Xing Xu , Xu Lu , Heng Tao Shen , Jingkuan Song

Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Yifan Lu , Ziqi Zhang , Chunfeng Yuan , Peng Li , Yan Wang , Bing Li , Weiming Hu

Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image. Prevailing SGG methods traditionally assume a one-off learning process for SGG. This conventional paradigm may necessitate…

Computer Vision and Pattern Recognition · Computer Science 2024-01-29 Tao He , Tongtong Wu , Dongyang Zhang , Guiduo Duan , Ke Qin , Yuan-Fang Li

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how…

Machine Learning · Computer Science 2023-05-23 Qian Huang , Hongyu Ren , Peng Chen , Gregor Kržmanc , Daniel Zeng , Percy Liang , Jure Leskovec

Scene Graph Generation (SGG) unifies object localization and visual relationship reasoning by predicting boxes and subject-predicate-object triples. Yet most pipelines treat SGG as a one-shot, deterministic classification problem rather…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Xin Hu , Ke Qin , Wen Yin , Yuan-Fang Li , Ming Li , Tao He

Meta learning methods have found success when applied to few shot classification problems, in which they quickly adapt to a small number of labeled examples. Prototypical representations, each representing a particular class, have been of…

Machine Learning · Computer Science 2020-07-21 Evan Vogelbaum , Rumen Dangovski , Li Jing , Marin Soljačić

In real-world environments, AI systems often face unfamiliar scenarios without labeled data, creating a major challenge for conventional scene understanding models. The inability to generalize across unseen contexts limits the deployment of…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Manjunath Prasad Holenarasipura Rajiv , B. M. Vidyavathi

Weakly supervised visual grounding (VG) aims to locate objects in images based on text descriptions. Despite significant progress, existing methods lack strong cross-modal reasoning to distinguish subtle semantic differences in text…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Yidan Wang , Chenyi Zhuang , Wutao Liu , Pan Gao , Nicu Sebe

Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Li Li , Wei Ji , Yiming Wu , Mengze Li , You Qin , Lina Wei , Roger Zimmermann

We propose Context-Adaptive Multi-Prompt Embedding, a novel approach to enrich semantic representations in vision-language contrastive learning. Unlike standard CLIP-style models that rely on a single text embedding, our method introduces…

Machine Learning · Computer Science 2025-08-07 Dahun Kim , Anelia Angelova

Video-and-language pre-training has shown promising improvements on various downstream tasks. Most previous methods capture cross-modal interactions with a transformer-based multimodal encoder, not fully addressing the misalignment between…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Dongxu Li , Junnan Li , Hongdong Li , Juan Carlos Niebles , Steven C. H. Hoi

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Chaofan Zheng , Xinyu Lyu , Lianli Gao , Bo Dai , Jingkuan Song

Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world applications with massive uncommon predicate categories whose…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xingchen Li , Jun Xiao , Guikun Chen , Yinfu Feng , Yi Yang , An-an Liu , Long Chen

Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Wei Tang , Liang Li , Xuejing Liu , Lu Jin , Jinhui Tang , Zechao Li

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing. Different from the mainstream…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Zhuotao Tian , Jiequan Cui , Li Jiang , Xiaojuan Qi , Xin Lai , Yixin Chen , Shu Liu , Jiaya Jia

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Arushi Goel , Basura Fernando , Frank Keller , Hakan Bilen
‹ Prev 1 2 3 10 Next ›