English

Object-based reasoning in VQA

Computer Vision and Pattern Recognition 2018-01-31 v1

Abstract

Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task given in the form of a natural language. As the solutions inherently require to combine visual and natural language processing with abstract reasoning, the problem is considered as AI-complete. Recent advances indicate that using high-level, abstract facts extracted from the inputs might facilitate reasoning. Following that direction we decided to develop a solution combining state-of-the-art object detection and reasoning modules. The results, achieved on the well-balanced CLEVR dataset, confirm the promises and show significant, few percent improvements of accuracy on the complex "counting" task.

Keywords

Cite

@article{arxiv.1801.09718,
  title  = {Object-based reasoning in VQA},
  author = {Mikyas T. Desta and Larry Chen and Tomasz Kornuta},
  journal= {arXiv preprint arXiv:1801.09718},
  year   = {2018}
}

Comments

10 pages, 15 figures, published as a conference paper at 2018 IEEE Winter Conf. on Applications of Computer Vision (WACV'2018)

R2 v1 2026-06-23T00:02:13.318Z