English

Entity-aware Image Caption Generation

Computation and Language 2018-11-08 v2

Abstract

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics.

Keywords

Cite

@article{arxiv.1804.07889,
  title  = {Entity-aware Image Caption Generation},
  author = {Di Lu and Spencer Whitehead and Lifu Huang and Heng Ji and Shih-Fu Chang},
  journal= {arXiv preprint arXiv:1804.07889},
  year   = {2018}
}

Comments

In proceedings of EMNLP 2018

R2 v1 2026-06-23T01:30:48.173Z