Related papers: Neural Image Captioning

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In…

Computer Vision and Pattern Recognition · Computer Science 2017-11-28 Jyoti Aneja , Aditya Deshpande , Alexander Schwing

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic…

Computer Vision and Pattern Recognition · Computer Science 2016-12-19 Qi Wu , Chunhua Shen , Anton van den Hengel , Peng Wang , Anthony Dick

An Empirical Study of Language CNN for Image Captioning

Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Jiuxiang Gu , Gang Wang , Jianfei Cai , Tsuhan Chen

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Ting Yao , Yingwei Pan , Yehao Li , Zhaofan Qiu , Tao Mei

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Cheng Wang , Haojin Yang , Christian Bartz , Christoph Meinel

Image Captioning using Deep Neural Architectures

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about…

Computer Vision and Pattern Recognition · Computer Science 2018-10-03 Parth Shah , Vishvajit Bakarola , Supriya Pati

Compressed Image Captioning using CNN-based Encoder-Decoder Framework

In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Md Alif Rahman Ridoy , M Mahmud Hasan , Shovon Bhowmick

Advancements in Image Classification using Convolutional Neural Network

Convolutional Neural Network (CNN) is the state-of-the-art for image classification task. Here we have briefly discussed different components of CNN. In this paper, We have explained different CNN architectures for image classification.…

Computer Vision and Pattern Recognition · Computer Science 2019-05-27 Farhana Sultana , A. Sufian , Paramartha Dutta

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects…

Computer Vision and Pattern Recognition · Computer Science 2017-08-18 Ting Yao , Yingwei Pan , Yehao Li , Tao Mei

Efficient CNN-LSTM based Image Captioning using Neural Network Compression

Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Harshit Rampal , Aman Mohanty

Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all…

Computation and Language · Computer Science 2016-10-12 Xiangang Li , Xihong Wu

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum…

Computation and Language · Computer Science 2015-10-16 Jacob Devlin , Hao Cheng , Hao Fang , Saurabh Gupta , Li Deng , Xiaodong He , Geoffrey Zweig , Margaret Mitchell

A Critical Review of Recurrent Neural Networks for Sequence Learning

Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video…

Machine Learning · Computer Science 2015-10-20 Zachary C. Lipton , John Berkowitz , Charles Elkan

Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the…

Computer Vision and Pattern Recognition · Computer Science 2019-01-14 Shiyang Yan , Yuan Xie , Fangyu Wu , Jeremy S. Smith , Wenjin Lu , Bailing Zhang

SLCNN: Sentence-Level Convolutional Neural Network for Text Classification

Text classification is a fundamental task in natural language processing (NLP). Several recent studies show the success of deep learning on text processing. Convolutional neural network (CNN), as a popular deep learning model, has shown…

Computation and Language · Computer Science 2023-01-30 Ali Jarrahi , Ramin Mousa , Leila Safari

MAT: A Multimodal Attentive Translator for Image Captioning

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different…

Computer Vision and Pattern Recognition · Computer Science 2017-08-11 Chang Liu , Fuchun Sun , Changhu Wang , Feng Wang , Alan Yuille