English

iCap: Interactive Image Captioning with Predictive Text

Human-Computer Interaction 2020-02-25 v3 Computer Vision and Pattern Recognition

Abstract

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose asynchronous bidirectional decoding for image caption completion (ABD-Cap). With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.

Keywords

Cite

@article{arxiv.2001.11782,
  title  = {iCap: Interactive Image Captioning with Predictive Text},
  author = {Zhengxiong Jia and Xirong Li},
  journal= {arXiv preprint arXiv:2001.11782},
  year   = {2020}
}
R2 v1 2026-06-23T13:26:24.545Z