Related papers: Learning Context-Conditioned Predicate Semantics v…

LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation

Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset long-tail problem. Recently, various unbiased strategies have been proposed by designing novel loss functions and data balancing…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Xiaoguang Chang , Teng Wang , Shaowei Cai , Changyin Sun

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation

The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, in SGG benchmark datasets, each subject-object pair is annotated with…

Computer Vision and Pattern Recognition · Computer Science 2024-07-26 Jaehyeong Jeon , Kibum Kim , Kanghoon Yoon , Chanyoung Park

Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement

Scene Graph Generation (SGG) provides basic language representation of visual scenes, requiring models to grasp complex and diverse semantics between objects. This complexity and diversity in SGG leads to underrepresentation, where parts of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-30 Yuxuan Wang , Xiaoyuan Liu

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This reasoning unfaithfulness frequently manifests as…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Chuhan Wang , Xintong Li , Jennifer Yuntong Zhang , Junda Wu , Chengkai Huang , Lina Yao , Julian McAuley , Jingbo Shang

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

The scene graph generation (SGG) task aims to detect visual relationship triplets, i.e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding. However, current models are stuck in common…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Yuyu Guo , Lianli Gao , Xuanhan Wang , Yuxuan Hu , Xing Xu , Xu Lu , Heng Tao Shen , Jingkuan Song

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Yifan Lu , Ziqi Zhang , Chunfeng Yuan , Peng Li , Yan Wang , Bing Li , Weiming Hu

Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning

Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image. Prevailing SGG methods traditionally assume a one-off learning process for SGG. This conventional paradigm may necessitate…

Computer Vision and Pattern Recognition · Computer Science 2024-01-29 Tao He , Tongtong Wu , Dongyang Zhang , Guiduo Duan , Ke Qin , Yuan-Fang Li

PRODIGY: Enabling In-context Learning Over Graphs

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how…

Machine Learning · Computer Science 2023-05-23 Qian Huang , Hongyu Ren , Peng Chen , Gregor Kržmanc , Daniel Zeng , Percy Liang , Jure Leskovec

Can We Build Scene Graphs, Not Classify Them? FlowSG: Progressive Image-Conditioned Scene Graph Generation with Flow Matching

Scene Graph Generation (SGG) unifies object localization and visual relationship reasoning by predicting boxes and subject-predicate-object triples. Yet most pipelines treat SGG as a one-shot, deterministic classification problem rather…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Xin Hu , Ke Qin , Wen Yin , Yuan-Fang Li , Ming Li , Tao He

Contextualizing Enhances Gradient Based Meta Learning

Meta learning methods have found success when applied to few shot classification problems, in which they quickly adapt to a small number of labeled examples. Prototypical representations, each representing a particular class, have been of…

Machine Learning · Computer Science 2020-07-21 Evan Vogelbaum , Rumen Dangovski , Li Jing , Marin Soljačić

Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios

In real-world environments, AI systems often face unfamiliar scenarios without labeled data, creating a major challenge for conventional scene understanding models. The inability to generalize across unseen contexts limits the deployment of…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Manjunath Prasad Holenarasipura Rajiv , B. M. Vidyavathi

AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding

Weakly supervised visual grounding (VG) aims to locate objects in images based on text descriptions. Despite significant progress, existing methods lack strong cross-modal reasoning to distinguish subtle semantic differences in text…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Yidan Wang , Chenyi Zhuang , Wutao Liu , Pan Gao , Nicu Sebe

Panoptic Scene Graph Generation with Semantics-Prototype Learning

Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Li Li , Wei Ji , Yiming Wu , Mengze Li , You Qin , Lina Wei , Roger Zimmermann

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

We propose Context-Adaptive Multi-Prompt Embedding, a novel approach to enrich semantic representations in vision-language contrastive learning. Unlike standard CLIP-style models that rely on a single text embedding, our method introduces…

Machine Learning · Computer Science 2025-08-07 Dahun Kim , Anelia Angelova

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Video-and-language pre-training has shown promising improvements on various downstream tasks. Most previous methods capture cross-modal interactions with a transformer-based multimodal encoder, not fully addressing the misalignment between…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Dongxu Li , Junnan Li , Hongdong Li , Juan Carlos Niebles , Steven C. H. Hoi

Prototype-based Embedding Network for Scene Graph Generation

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Chaofan Zheng , Xinyu Lyu , Lianli Gao , Bo Dai , Jingkuan Song

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world applications with massive uncommon predicate categories whose…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xingchen Li , Jun Xiao , Guikun Chen , Yinfu Feng , Yi Yang , An-an Liu , Long Chen

Context Disentangling and Prototype Inheriting for Robust Visual Grounding

Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Wei Tang , Liang Li , Xuejing Liu , Lu Jin , Jinhui Tang , Zechao Li

Learning Context-aware Classifier for Semantic Segmentation

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing. Different from the mainstream…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Zhuotao Tian , Jiequan Cui , Li Jiang , Xiaojuan Qi , Xin Lai , Yixin Chen , Shu Liu , Jiaya Jia

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Arushi Goel , Basura Fernando , Frank Keller , Hakan Bilen