English
Related papers

Related papers: Curriculum Learning for Compositional Visual Reaso…

200 papers

Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce…

Computation and Language · Computer Science 2023-10-25 Wafa Aissa , Marin Ferecatu , Michel Crucianu

Visual Question Answering (VQA) is a multi-discipline research task. To produce the right answer, it requires an understanding of the visual content of images, the natural language questions, as well as commonsense reasoning over the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Yao Zhang , Haokun Chen , Ahmed Frikha , Yezi Yang , Denis Krompass , Gengyuan Zhang , Jindong Gu , Volker Tresp

Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with…

Computer Vision and Pattern Recognition · Computer Science 2018-06-14 Zhou Su , Chen Zhu , Yinpeng Dong , Dongqi Cai , Yurong Chen , Jianguo Li

Visual Question Answering (VQA) models have achieved significant success in recent times. Despite the success of VQA models, they are mostly black-box models providing no reasoning about the predicted answer, thus raising questions for…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Nihar Bendre , Kevin Desai , Peyman Najafirad

Visual Question Answering (VQA) has emerged as one of the most challenging tasks in artificial intelligence due to its multi-modal nature. However, most existing VQA methods are incapable of handling Knowledge-based Visual Question…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Chengxiang Yin , Zhengping Che , Kun Wu , Zhiyuan Xu , Jian Tang

Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is…

Computation and Language · Computer Science 2024-11-27 Jiayi Kuang , Jingyou Xie , Haohao Luo , Ronghao Li , Zhe Xu , Xianfeng Cheng , Yinghui Li , Xika Lin , Ying Shen

Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception. However, recent advances in this area are still primarily driven by…

Machine Learning · Computer Science 2020-08-27 Saeed Amizadeh , Hamid Palangi , Oleksandr Polozov , Yichen Huang , Kazuhito Koishida

Having revolutionized natural language processing (NLP) applications, large language models (LLMs) are expanding into the realm of multimodal inputs. Owing to their ability to interpret images, multimodal LLMs (MLLMs) have been primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-02-14 Jusung Lee , Sungguk Cha , Younghyun Lee , Cheoljong Yang

In order to achieve a general visual question answering (VQA) system, it is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. Meanwhile, the reasoning process should be…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Zihao Zhu

Numerical reasoning skills are essential for complex question answering (CQA) over text. It requires opertaions including counting, comparison, addition and subtraction. A successful approach to CQA on text, Neural Module Networks (NMNs),…

Computation and Language · Computer Science 2021-09-07 Xiao-Yu Guo , Yuan-Fang Li , Gholamreza Haffari

Foundation models for vision have transformed visual recognition with powerful pretrained representations and strong zero-shot capabilities, yet their potential for data-efficient learning remains largely untapped. Active Learning (AL) aims…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Huy Hoang Nguyen , Cédric Jung , Shirin Salehi , Tobias Glück , Anke Schmeink , Andreas Kugi

Neural Module Networks (NMNs) aim at Visual Question Answering (VQA) via composition of modules that tackle a sub-task. NMNs are a promising strategy to achieve systematic generalization, i.e., overcoming biasing factors in the training…

Machine Learning · Computer Science 2022-01-19 Vanessa D'Amario , Tomotake Sasaki , Xavier Boix

The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system. Although very recent works also tried the explicit compositional…

Computer Vision and Pattern Recognition · Computer Science 2018-04-03 Qingxing Cao , Xiaodan Liang , Bailing Li , Guanbin Li , Liang Lin

Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Xingchen Zeng , Haichuan Lin , Yilin Ye , Wei Zeng

Visual question answering (VQA) is crucial for promoting surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical…

Information Retrieval · Computer Science 2024-10-24 Yuyang Du , Kexin Chen , Yue Zhan , Chang Han Low , Tao You , Mobarakol Islam , Ziyu Guo , Yueming Jin , Guangyong Chen , Pheng-Ann Heng

Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in the general Visual Question Answering (VQA). However, these models struggle with VQA questions that…

Computation and Language · Computer Science 2024-11-06 Shuo Yang , Siwen Luo , Soyeon Caren Han

Recently, large multi-modal models (LMMs) have emerged with the capacity to perform vision tasks such as captioning and visual question answering (VQA) with unprecedented accuracy. Applications such as helping the blind or visually impaired…

Computation and Language · Computer Science 2024-06-04 Julian Martin Eisenschlos , Hernán Maina , Guido Ivetta , Luciana Benotti

Visual Question Answering (VQA) is the task of answering a question about an image and requires processing multimodal input and reasoning to obtain the answer. Modular solutions that use declarative representations within the reasoning…

Artificial Intelligence · Computer Science 2024-10-15 Thomas Eiter , Jan Hadl , Nelson Higuera , Johannes Oetsch

Neural module networks (NMN) have achieved success in image-grounded tasks such as Visual Question Answering (VQA) on synthetic images. However, very limited work on NMN has been studied in the video-grounded dialogue tasks. These tasks…

Computer Vision and Pattern Recognition · Computer Science 2022-06-14 Hung Le , Nancy F. Chen , Steven C. H. Hoi

A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of…

Artificial Intelligence · Computer Science 2023-10-31 Rim Assouel , Pau Rodriguez , Perouz Taslakian , David Vazquez , Yoshua Bengio
‹ Prev 1 2 3 10 Next ›