Related papers: Exploring Sparse Spatial Relation in Graph Inferen…

Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture…

Multimedia · Computer Science 2019-08-22 Zhuoqian Yang , Zengchang Qin , Jing Yu , Yue Hu

SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization

Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Asish Bera , Zachary Wharton , Yonghuai Liu , Nik Bessis , Ardhendu Behera

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying…

Computer Vision and Pattern Recognition · Computer Science 2021-09-01 Gi-Cheon Kang , Junseok Park , Hwaran Lee , Byoung-Tak Zhang , Jin-Hwa Kim

Structured Sparse R-CNN for Direct Scene Graph Generation

Scene graph generation (SGG) is to detect object pairs with their relations in an image. Existing SGG approaches often use multi-stage pipelines to decompose this task into object detection, relation graph construction, and dense or…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Yao Teng , Limin Wang

Relation-Aware Graph Attention Network for Visual Question Answering

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Linjie Li , Zhe Gan , Yu Cheng , Jingjing Liu

Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture

Previous studies such as VizWiz find that Visual Question Answering (VQA) systems that can read and reason about text in images are useful in application areas such as assisting visually-impaired people. TextVQA is a VQA dataset geared…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Michael Yang , Aditya Anantharaman , Zachary Kitowski , Derik Clive Robert

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions. Existing graph-based methods for VideoQA usually ignore keywords in…

Computer Vision and Pattern Recognition · Computer Science 2023-07-26 Yi Cheng , Hehe Fan , Dongyun Lin , Ying Sun , Mohan Kankanhalli , Joo-Hwee Lim

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

Text-based Visual Question Answering~(TextVQA) aims to produce correct answers for given questions about the images with multiple scene texts. In most cases, the texts naturally attach to the surface of the objects. Therefore, spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-06-16 Hao Li , Jinfa Huang , Peng Jin , Guoli Song , Qi Wu , Jie Chen

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word…

Computation and Language · Computer Science 2022-03-22 Yinhua Piao , Sangseon Lee , Dohoon Lee , Sun Kim

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

Visual commonsense reasoning (VCR) is a challenging multi-modal task, which requires high-level cognition and commonsense reasoning ability about the real world. In recent years, large-scale pre-training approaches have been developed and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Cheng Yang , Rui Xu , Ye Guo , Peixiang Huang , Yiru Chen , Wenkui Ding , Zhongyuan Wang , Hong Zhou

Sparse Graph Learning from Spatiotemporal Time Series

Outstanding achievements of graph neural networks for spatiotemporal time series analysis show that relational constraints introduce an effective inductive bias into neural forecasting architectures. Often, however, the relational…

Machine Learning · Computer Science 2023-08-03 Andrea Cini , Daniele Zambon , Cesare Alippi

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

For a given video-based Human-Object Interaction scene, modeling the spatio-temporal relationship between humans and objects are the important cue to understand the contextual information presented in the video. With the effective…

Computer Vision and Pattern Recognition · Computer Science 2021-08-20 Ning Wang , Guangming Zhu , Liang Zhang , Peiyi Shen , Hongsheng Li , Cong Hua

Associative Knowledge Graphs for Efficient Sequence Storage and Retrieval

The paper addresses challenges in storing and retrieving sequences in contexts like anomaly detection, behavior prediction, and genetic information analysis. Associative Knowledge Graphs (AKGs) offer a promising approach by leveraging…

Artificial Intelligence · Computer Science 2025-09-11 Przemysław Stokłosa , Janusz A. Starzyk , Paweł Raif , Adrian Horzyk , Marcin Kowalik

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Comprehensive visual understanding requires detection frameworks that can effectively learn and utilize object interactions while analyzing objects individually. This is the main objective in Human-Object Interaction (HOI) detection task.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-13 Oytun Ulutan , A S M Iftekhar , B. S. Manjunath

Spatially Aware Multimodal Transformers for TextVQA

Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches…

Computer Vision and Pattern Recognition · Computer Science 2020-12-24 Yash Kant , Dhruv Batra , Peter Anderson , Alex Schwing , Devi Parikh , Jiasen Lu , Harsh Agrawal

Joint learning of object graph and relation graph for visual question answering

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Hao Li , Xu Li , Belhal Karimi , Jie Chen , Mingming Sun

Learning Time-aware Graph Structures for Spatially Correlated Time Series Forecasting

Spatio-temporal forecasting of future values of spatially correlated time series is important across many cyber-physical systems (CPS). Recent studies offer evidence that the use of graph neural networks to capture latent correlations…

Machine Learning · Computer Science 2023-12-29 Minbo Ma , Jilin Hu , Christian S. Jensen , Fei Teng , Peng Han , Zhiqiang Xu , Tianrui Li

Graph Pruning Based Spatial and Temporal Graph Convolutional Network with Transfer Learning for Traffic Prediction

With the process of urbanization and the rapid growth of population, the issue of traffic congestion has become an increasingly critical concern. Intelligent transportation systems heavily rely on real-time and precise prediction algorithms…

Artificial Intelligence · Computer Science 2025-01-03 Zihao Jing

Graph-based Virtual Sensing from Sparse and Partial Multivariate Observations

Virtual sensing techniques allow for inferring signals at new unmonitored locations by exploiting spatio-temporal measurements coming from physical sensors at different locations. However, as the sensor coverage becomes sparse due to costs…

Machine Learning · Computer Science 2024-02-21 Giovanni De Felice , Andrea Cini , Daniele Zambon , Vladimir V. Gusev , Cesare Alippi

Weakly Supervised Visual Semantic Parsing

Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval. Nevertheless,…

Computer Vision and Pattern Recognition · Computer Science 2020-04-02 Alireza Zareian , Svebor Karaman , Shih-Fu Chang