Related papers: A Deep Decoder Structure Based on WordEmbedding Re…

Analysis of Convolutional Decoder for Image Caption Generation

Recently Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. However, unlike Recurrent Neural Networks, the performance of Convolutional Neural Networks as Decoders for Image…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Sulabh Katiyar , Samir Kumar Borgohain

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Yiyu Wang , Jungang Xu , Yingfei Sun , Ben He

Exploring Discrete Diffusion Models for Image Captioning

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image…

Computer Vision and Pattern Recognition · Computer Science 2022-12-12 Zixin Zhu , Yixuan Wei , Jianfeng Wang , Zhe Gan , Zheng Zhang , Le Wang , Gang Hua , Lijuan Wang , Zicheng Liu , Han Hu

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

A sequential guiding network with attention for image captioning

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description…

Computer Vision and Pattern Recognition · Computer Science 2019-02-12 Daouda Sow , Zengchang Qin , Mouhamed Niasse , Tao Wan

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Neural Attention for Image Captioning: Review of Outstanding Methods

Image captioning is the task of automatically generating sentences that describe an input image in the best way possible. The most successful techniques for automatically generating image captions have recently used attentive deep learning…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Zanyar Zohourianshahzadi , Jugal K. Kalita

An Ensemble Model with Attention Based Mechanism for Image Captioning

Image captioning creates informative text from an input image by creating a relationship between the words and the actual content of an image. Recently, deep learning models that utilize transformers have been the most successful in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Israa Al Badarneh , Bassam Hammo , Omar Al-Kadi

Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used for image captioning. Many of these frameworks deploy their full focus on generating the caption from scratch by relying solely on the image features or the object…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Fawaz Sammani , Mahmoud Elsayed

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Rashid Khan , Bingding Huang , Haseeb Hassan , Asim Zaman , Zhongfu Ye

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Aditya Bhattacharya , Eshwar Shamanna Girishekar , Padmakar Anil Deshpande

Fast Image Caption Generation with Position Alignment

Recent neural network models for image captioning usually employ an encoder-decoder architecture, where the decoder adopts a recursive sequence decoding way. However, such autoregressive decoding may result in sequential error accumulation…

Computer Vision and Pattern Recognition · Computer Science 2019-12-16 Zheng-cong Fei

Reflective Decoding Network for Image Captioning

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary…

Computer Vision and Pattern Recognition · Computer Science 2019-09-02 Lei Ke , Wenjie Pei , Ruiyu Li , Xiaoyong Shen , Yu-Wing Tai

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality…

Machine Learning · Computer Science 2021-09-17 Shuqi Lu , Di He , Chenyan Xiong , Guolin Ke , Waleed Malik , Zhicheng Dou , Paul Bennett , Tieyan Liu , Arnold Overwijk

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding…

Machine Learning · Computer Science 2014-11-11 Ryan Kiros , Ruslan Salakhutdinov , Richard S. Zemel

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images. This study aims to develop a system that uses a pre-trained convolutional neural…

Computation and Language · Computer Science 2022-03-04 Rashid Khan , M Shujah Islam , Khadija Kanwal , Mansoor Iqbal , Md. Imran Hossain , Zhongfu Ye

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple…

Computer Vision and Pattern Recognition · Computer Science 2023-01-09 Rana Adnan Ahmad , Muhammad Azhar , Hina Sattar

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing…

Computer Vision and Pattern Recognition · Computer Science 2018-03-15 Jiuxiang Gu , Jianfei Cai , Gang Wang , Tsuhan Chen