Related papers: Entity-aware Image Caption Generation

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains challenging as it is difficult to learn the association between…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Wentian Zhao , Yao Hu , Heda Wang , Xinxiao Wu , Jiebo Luo

Integrating Image Captioning with Rule-based Entity Masking

Given an image, generating its natural language description (i.e., caption) is a well studied problem. Approaches proposed to address this problem usually rely on image features that are difficult to interpret. Particularly, these image…

Computer Vision and Pattern Recognition · Computer Science 2020-07-24 Aditya Mogadala , Xiaoyu Shen , Dietrich Klakow

Guiding Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as…

Computer Vision and Pattern Recognition · Computer Science 2015-09-17 Xu Jia , Efstratios Gavves , Basura Fernando , Tinne Tuytelaars

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information, even to the extent of inventing plausible explanations when contextual information and images do not match. In…

Computer Vision and Pattern Recognition · Computer Science 2022-09-22 Khanh Nguyen , Ali Furkan Biten , Andres Mafla , Lluis Gomez , Dimosthenis Karatzas

ICECAP: Information Concentrated Entity-aware Image Captioning

Most current image captioning systems focus on describing general image content, and lack background knowledge to deeply understand the image, such as exact named entities or concrete events. In this work, we focus on the entity-aware news…

Computer Vision and Pattern Recognition · Computer Science 2021-08-05 Anwen Hu , Shizhe Chen , Qin Jin

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning

News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Current MLLMs still bear limitations in handling entity information in news image captioning…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Junzhe Zhang , Huixuan Zhang , Xunjian Yin , Xiaojun Wan

Generating Diverse and Meaningful Captions

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher

Phrase-based Image Captioning with Hierarchical LSTM Model

Automatic generation of caption to describe the content of an image has been gaining a lot of research interests recently, where most of the existing works treat the image caption as pure sequential data. Natural language, however possess a…

Computer Vision and Pattern Recognition · Computer Science 2017-11-16 Ying Hua Tan , Chee Seng Chan

Informative Image Captioning with External Sources of Information

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained…

Computation and Language · Computer Science 2019-06-24 Sanqiang Zhao , Piyush Sharma , Tomer Levinboim , Radu Soricut

Generating image captions with external encyclopedic knowledge

Accurately reporting what objects are depicted in an image is largely a solved problem in automatic caption generation. The next big challenge on the way to truly humanlike captioning is being able to incorporate the context of the image…

Computation and Language · Computer Science 2022-10-11 Sofia Nikiforova , Tejaswini Deoskar , Denis Paperno , Yoad Winter

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

Show, Edit and Tell: A Framework for Editing Image Captions

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when…

Computer Vision and Pattern Recognition · Computer Science 2020-03-09 Fawaz Sammani , Luke Melas-Kyriazi

Video Summarization: Towards Entity-Aware Captions

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Hammad A. Ayyubi , Tianqi Liu , Arsha Nagrani , Xudong Lin , Mingda Zhang , Anurag Arnab , Feng Han , Yukun Zhu , Jialu Liu , Shih-Fu Chang

How to Understand Named Entities: Using Common Sense for News Captioning

News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to…

Computation and Language · Computer Science 2024-03-12 Ning Xu , Yanhui Wang , Tingting Zhang , Hongshuo Tian , Mohan Kankanhalli , An-An Liu

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a…

Computation and Language · Computer Science 2015-04-10 Rémi Lebret , Pedro O. Pinheiro , Ronan Collobert

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Good News, Everyone! Context driven entity-aware captioning for news images

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-03 Ali Furkan Biten , Lluis Gomez , Marçal Rusiñol , Dimosthenis Karatzas

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Attentive Semantic Video Generation using Captions

This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term…

Computer Vision and Pattern Recognition · Computer Science 2017-11-17 Tanya Marwah , Gaurav Mittal , Vineeth N. Balasubramanian

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

A picture is worth a thousand words. Not until recently, however, we noticed some success stories in understanding of visual scenes: a model that is able to detect/name objects, describe their attributes, and recognize their…

Computation and Language · Computer Science 2017-10-27 Ying Hua Tan , Chee Seng Chan