Related papers: Boosting Image Captioning with Attributes

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

Exploring Visual Relationship for Image Captioning

It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation.…

Computer Vision and Pattern Recognition · Computer Science 2018-09-20 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable improvements in recent advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be sufficiently described with models built on image-caption pairs…

Computer Vision and Pattern Recognition · Computer Science 2019-04-26 Yehao Li , Ting Yao , Yingwei Pan , Hongyang Chao , Tao Mei

Video Captioning with Transferred Semantic Attributes

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural…

Computer Vision and Pattern Recognition · Computer Science 2016-11-24 Yingwei Pan , Ting Yao , Houqiang Li , Tao Mei

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In…

Computer Vision and Pattern Recognition · Computer Science 2017-11-28 Jyoti Aneja , Aditya Deshpande , Alexander Schwing

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Neural Image Captioning

In recent years, the biggest advances in major Computer Vision tasks, such as object recognition, handwritten-digit identification, facial recognition, and many others., have all come through the use of Convolutional Neural Networks (CNNs).…

Computation and Language · Computer Science 2019-07-05 Elaina Tan , Lakshay Sharma

Image Captioning with Object Detection and Localization

Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that…

Computer Vision and Pattern Recognition · Computer Science 2017-06-09 Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang

Guiding Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as…

Computer Vision and Pattern Recognition · Computer Science 2015-09-17 Xu Jia , Efstratios Gavves , Basura Fernando , Tinne Tuytelaars

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models combining convolutional neural network (CNN) and long short-term memory (LSTM) achieve promising results. Despite significant progress, these models generate sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-10-16 Hongwei Ge , Zehang Yan , Kai Zhang , Mingde Zhao , Liang Sun

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

Towards Retrieval-Augmented Architectures for Image Captioning

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Alessandro Nicolosi , Rita Cucchiara

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Cheng Wang , Haojin Yang , Christian Bartz , Christoph Meinel

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

A picture is worth a thousand words. Not until recently, however, we noticed some success stories in understanding of visual scenes: a model that is able to detect/name objects, describe their attributes, and recognize their…

Computation and Language · Computer Science 2017-10-27 Ying Hua Tan , Chee Seng Chan

Phrase-based Image Captioning with Hierarchical LSTM Model

Automatic generation of caption to describe the content of an image has been gaining a lot of research interests recently, where most of the existing works treat the image caption as pure sequential data. Natural language, however possess a…

Computer Vision and Pattern Recognition · Computer Science 2017-11-16 Ying Hua Tan , Chee Seng Chan

Learning to Collocate Neural Modules for Image Captioning

We do not speak word by word from scratch; our brain quickly structures a pattern like \textsc{sth do sth at someplace} and then fill in the detailed descriptions. To render existing encoder-decoder image captioners such human-like…

Computer Vision and Pattern Recognition · Computer Science 2019-04-19 Xu Yang , Hanwang Zhang , Jianfei Cai

Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all…

Computation and Language · Computer Science 2016-10-12 Xiangang Li , Xihong Wu

Entity-aware Image Caption Generation

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given…

Computation and Language · Computer Science 2018-11-08 Di Lu , Spencer Whitehead , Lifu Huang , Heng Ji , Shih-Fu Chang