Related papers: Exploring Visual Relationship for Image Captioning

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Ting Yao , Yingwei Pan , Yehao Li , Zhaofan Qiu , Tao Mei

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable improvements in recent advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be sufficiently described with models built on image-caption pairs…

Computer Vision and Pattern Recognition · Computer Science 2019-04-26 Yehao Li , Ting Yao , Yingwei Pan , Hongyang Chao , Tao Mei

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

Image captioning attempts to generate a sentence composed of several linguistic words, which are used to describe objects, attributes, and interactions in an image, denoted as visual semantic units in this paper. Based on this view, we…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Longteng Guo , Jing Liu , Jinhui Tang , Jiangwei Li , Wei Luo , Hanqing Lu

Exploring Explicit and Implicit Visual Relationships for Image Captioning

Image captioning is one of the most challenging tasks in AI, which aims to automatically generate textual sentences for an image. Recent methods for image captioning follow encoder-decoder framework that transforms the sequence of salient…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Zeliang Song , Xiaofei Zhou

Better Understanding Hierarchical Visual Relationship for Image Caption

The Convolutional Neural Network (CNN) has been the dominant image feature extractor in computer vision for years. However, it fails to get the relationship between images/objects and their hierarchical interactions which can be helpful for…

Computer Vision and Pattern Recognition · Computer Science 2019-12-05 Zheng-cong Fei

Watch What You Just Said: Image Captioning with Text-Conditional Attention

Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, existing methods use only visual content as attention and whether textual context can improve attention in image…

Computer Vision and Pattern Recognition · Computer Science 2016-11-28 Luowei Zhou , Chenliang Xu , Parker Koch , Jason J. Corso

Image Captioning with Object Detection and Localization

Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that…

Computer Vision and Pattern Recognition · Computer Science 2017-06-09 Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Guiding Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as…

Computer Vision and Pattern Recognition · Computer Science 2015-09-17 Xu Jia , Efstratios Gavves , Basura Fernando , Tinne Tuytelaars

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models combining convolutional neural network (CNN) and long short-term memory (LSTM) achieve promising results. Despite significant progress, these models generate sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-10-16 Hongwei Ge , Zehang Yan , Kai Zhang , Mingde Zhao , Liang Sun

A sequential guiding network with attention for image captioning

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description…

Computer Vision and Pattern Recognition · Computer Science 2019-02-12 Daouda Sow , Zengchang Qin , Mouhamed Niasse , Tao Wan

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In…

Computer Vision and Pattern Recognition · Computer Science 2017-11-28 Jyoti Aneja , Aditya Deshpande , Alexander Schwing

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Cheng Wang , Haojin Yang , Christian Bartz , Christoph Meinel

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism. To generate coherent descriptions, we capture global semantic context using a multi-layer…

Computation and Language · Computer Science 2020-12-03 Jing Su , Chenghua Lin , Mian Zhou , Qingyun Dai , Haoyu Lv

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received considerable attention. In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their…

Computer Vision and Pattern Recognition · Computer Science 2017-07-25 Xuwang Yin , Vicente Ordonez

Image Captioning: Transforming Objects into Words

Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals…

Computer Vision and Pattern Recognition · Computer Science 2020-01-14 Simao Herdade , Armin Kappeler , Kofi Boakye , Joao Soares

Object-Level Context Modeling For Scene Classification with Context-CNN

Convolutional Neural Networks (CNNs) have been used extensively for computer vision tasks and produce rich feature representation for objects or parts of an image. But reasoning about scenes requires integration between the low-level…

Computer Vision and Pattern Recognition · Computer Science 2017-06-05 Syed Ashar Javed , Anil Kumar Nelakanti

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning

Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image, without exploring the contextual correlation existed among contextual image. In this paper, we propose Dual Graph…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Xinzhi Dong , Chengjiang Long , Wenju Xu , Chunxia Xiao