English

Logically Consistent Loss for Visual Question Answering

Computer Vision and Pattern Recognition 2020-11-23 v1 Artificial Intelligence

Abstract

Given an image, a back-ground knowledge, and a set of questions about an object, human learners answer the questions very consistently regardless of question forms and semantic tasks. The current advancement in neural-network based Visual Question Answering (VQA), despite their impressive performance, cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework as well as a data organisation called family-batch and hybrid-batch. To demonstrate usefulness of this proposal, we train and evaluate MAC-net based VQA machines with and without the proposed logically consistent loss and the proposed data organization. The experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance. Though the proposed approach is tested with MAC-net, it can be utilised in any other QA methods whenever the logical consistency between answers exist.

Keywords

Cite

@article{arxiv.2011.10094,
  title  = {Logically Consistent Loss for Visual Question Answering},
  author = {Anh-Cat Le-Ngo and Truyen Tran and Santu Rana and Sunil Gupta and Svetha Venkatesh},
  journal= {arXiv preprint arXiv:2011.10094},
  year   = {2020}
}

Comments

10 pages, 6 figure, 9 tables

R2 v1 2026-06-23T20:22:57.244Z