Related papers: Convolutional Image Captioning

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Neural Image Captioning

In recent years, the biggest advances in major Computer Vision tasks, such as object recognition, handwritten-digit identification, facial recognition, and many others., have all come through the use of Convolutional Neural Networks (CNNs).…

Computation and Language · Computer Science 2019-07-05 Elaina Tan , Lakshay Sharma

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Cheng Wang , Haojin Yang , Christian Bartz , Christoph Meinel

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Ting Yao , Yingwei Pan , Yehao Li , Zhaofan Qiu , Tao Mei

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models combining convolutional neural network (CNN) and long short-term memory (LSTM) achieve promising results. Despite significant progress, these models generate sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-10-16 Hongwei Ge , Zehang Yan , Kai Zhang , Mingde Zhao , Liang Sun

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

A sequential guiding network with attention for image captioning

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description…

Computer Vision and Pattern Recognition · Computer Science 2019-02-12 Daouda Sow , Zengchang Qin , Mouhamed Niasse , Tao Wan

Guiding Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as…

Computer Vision and Pattern Recognition · Computer Science 2015-09-17 Xu Jia , Efstratios Gavves , Basura Fernando , Tinne Tuytelaars

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

Exploring Visual Relationship for Image Captioning

It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation.…

Computer Vision and Pattern Recognition · Computer Science 2018-09-20 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a…

Computation and Language · Computer Science 2015-08-11 Jack Hessel , Nicolas Savva , Michael J. Wilber

Experimenting with Self-Supervision using Rotation Prediction for Image Captioning

Image captioning is a task in the field of Artificial Intelligence that merges between computer vision and natural language processing. It is responsible for generating legends that describe images, and has various applications like…

Computer Vision and Pattern Recognition · Computer Science 2021-07-29 Ahmed Elhagry , Karima Kadaoui

MAT: A Multimodal Attentive Translator for Image Captioning

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different…

Computer Vision and Pattern Recognition · Computer Science 2017-08-11 Chang Liu , Fuchun Sun , Changhu Wang , Feng Wang , Alan Yuille

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

In a globalized world at the present epoch of generative intelligence, most of the manual labour tasks are automated with increased efficiency. This can support businesses to save time and money. A crucial component of generative…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Pranav Dandwate , Chaitanya Shahane , Vandana Jagtap , Shridevi C. Karande

Compressed Image Captioning using CNN-based Encoder-Decoder Framework

In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Md Alif Rahman Ridoy , M Mahmud Hasan , Shovon Bhowmick

Image Captioning through Image Transformer

Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Sen He , Wentong Liao , Hamed R. Tavakoli , Michael Yang , Bodo Rosenhahn , Nicolas Pugeault

Analysis of Convolutional Decoder for Image Caption Generation

Recently Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. However, unlike Recurrent Neural Networks, the performance of Convolutional Neural Networks as Decoders for Image…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Sulabh Katiyar , Samir Kumar Borgohain

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Jing Wang , Yingwei Pan , Ting Yao , Jinhui Tang , Tao Mei