Related papers: Visual Reference Resolution using Attention Memory…

Recursive Visual Attention in Visual Dialog

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems: (1) How to answer visually-grounded questions, which is the core…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Yulei Niu , Hanwang Zhang , Manli Zhang , Jianhong Zhang , Zhiwu Lu , Ji-Rong Wen

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog…

Computer Vision and Pattern Recognition · Computer Science 2019-08-30 Gi-Cheon Kang , Jaeseo Lim , Byoung-Tak Zhang

Modeling Coreference Relations in Visual Dialog

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes…

Computer Vision and Pattern Recognition · Computer Science 2022-03-08 Mingxiao Li , Marie-Francine Moens

Reciprocal Attention Fusion for Visual Question Answering

Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-30 Moshiur R Farazi , Salman H Khan

Learning to Agree on Vision Attention for Visual Commonsense Reasoning

Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Zhenyang Li , Yangyang Guo , Kejie Wang , Fan Liu , Liqiang Nie , Mohan Kankanhalli

Dual Recurrent Attention Units for Visual Question Answering

Visual Question Answering (VQA) requires AI models to comprehend data in two domains, vision and text. Current state-of-the-art models use learned attention mechanisms to extract relevant information from the input domains to answer a…

Artificial Intelligence · Computer Science 2019-03-27 Ahmed Osman , Wojciech Samek

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Satwik Kottur , José M. F. Moura , Devi Parikh , Dhruv Batra , Marcus Rohrbach

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Duy-Kien Nguyen , Takayuki Okatani

Question-Agnostic Attention for Visual Question Answering

Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from…

Computer Vision and Pattern Recognition · Computer Science 2021-08-26 Moshiur R Farazi , Salman H Khan , Nick Barnes

A Focused Dynamic Attention Model for Visual Question Answering

Visual Question and Answering (VQA) problems are attracting increasing interest from multiple research disciplines. Solving VQA problems requires techniques from both computer vision for understanding the visual contents of a presented…

Computer Vision and Pattern Recognition · Computer Science 2016-04-07 Ilija Ilievski , Shuicheng Yan , Jiashi Feng

Reasoning Over History: Context Aware Visual Dialog

While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to…

Computation and Language · Computer Science 2020-11-03 Muhammad A. Shah , Shikib Mehri , Tejas Srinivasan

An Improved Attention for Visual Question Answering

We consider the problem of Visual Question Answering (VQA). Given an image and a free-form, open-ended, question, expressed in natural language, the goal of VQA system is to provide accurate answer to this question with respect to the…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Tanzila Rahman , Shih-Han Chou , Leonid Sigal , Giuseppe Carenini

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on…

Computer Vision and Pattern Recognition · Computer Science 2016-03-22 Huijuan Xu , Kate Saenko

Multi-View Attention Network for Visual Dialog

Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e.g., question,…

Artificial Intelligence · Computer Science 2020-10-08 Sungjin Park , Taesun Whang , Yeochan Yoon , Heuiseok Lim

Video Question Answering via Attribute-Augmented Attention Network Learning

Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle…

Computer Vision and Pattern Recognition · Computer Science 2017-07-21 Yunan Ye , Zhou Zhao , Yimeng Li , Long Chen , Jun Xiao , Yueting Zhuang

Analysis of Visual Question Answering Algorithms with attention model

Visual question answering (VQA) usesimage processing algorithms to process the image and natural language processing methods to understand and answer the question. VQA is helpful to a visually impaired person, can be used for the security…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Param Ahir , Hiteishi M. Diwanji

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question…

Computer Vision and Pattern Recognition · Computer Science 2018-07-23 Pan Lu , Lei Ji , Wei Zhang , Nan Duan , Ming Zhou , Jianyong Wang

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Jung-Jun Kim , Dong-Gyu Lee , Jialin Wu , Hong-Gyu Jung , Seong-Whan Lee

Reference Resolution and Context Change in Multimodal Situated Dialogue for Exploring Data Visualizations

Reference resolution, which aims to identify entities being referred to by a speaker, is more complex in real world settings: new referents may be created by processes the agents engage in and/or be salient only because they belong to the…

Computation and Language · Computer Science 2022-09-07 Abhinav Kumar , Barbara Di Eugenio , Abari Bhattacharya , Jillian Aurisano , Andrew Johnson

High-Order Attention Models for Visual Question Answering

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual…

Computer Vision and Pattern Recognition · Computer Science 2017-11-15 Idan Schwartz , Alexander G. Schwing , Tamir Hazan