Related papers: Compressed Image Captioning using CNN-based Encode…

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Swadhin Das , Saarthak Gupta , Kamal Kumar , Raksha Sharma

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Aditya Bhattacharya , Eshwar Shamanna Girishekar , Padmakar Anil Deshpande

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Wenhao Jiang , Lin Ma , Yu-Gang Jiang , Wei Liu , Tong Zhang

Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network

Automatic Image Captioning is the never-ending effort of creating syntactically and validating the accuracy of textual descriptions of an image in natural language with context. The encoder-decoder structure used throughout existing Bengali…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Md Aminul Haque Palash , MD Abdullah Al Nasim , Sourav Saha , Faria Afrin , Raisa Mallik , Sathishkumar Samiappan

Better Understanding Hierarchical Visual Relationship for Image Caption

The Convolutional Neural Network (CNN) has been the dominant image feature extractor in computer vision for years. However, it fails to get the relationship between images/objects and their hierarchical interactions which can be helpful for…

Computer Vision and Pattern Recognition · Computer Science 2019-12-05 Zheng-cong Fei

Automated Image Captioning with CNNs and Transformers

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Joshua Adrian Cahyono , Jeremy Nathan Jusuf

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple…

Computer Vision and Pattern Recognition · Computer Science 2023-01-09 Rana Adnan Ahmad , Muhammad Azhar , Hina Sattar

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc.…

Computer Vision and Pattern Recognition · Computer Science 2020-02-25 Madhavan Seshadri , Malavika Srikanth , Mikhail Belov

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Yiyu Wang , Jungang Xu , Yingfei Sun , Ben He

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorporating various techniques into the CNN-RNN encoder-decoder architecture. However, since CNN and RNN do not share the basic network component, such a heterogeneous…

Computer Vision and Pattern Recognition · Computer Science 2022-04-18 Yang Xu , Li Li , Haiyang Xu , Songfang Huang , Fei Huang , Jianfei Cai

Semantic Perceptual Image Compression using Deep Convolution Networks

It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in…

Multimedia · Computer Science 2017-03-30 Aaditya Prakash , Nick Moran , Solomon Garber , Antonella DiLillo , James Storer

Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention

Image captioning is a technology that produces text-based descriptions for an image. Deep learning-based solutions built on top of feature recognition may very well serve the purpose. But as with any other machine learning solution, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Rishi Kesav Mohan , Sanjay Sureshkumar , Vignesh Sivasubramaniam

End-to-End Video Captioning

Building correspondences across different modalities, such as video and language, has recently become critical in many visual recognition applications, such as video captioning. Inspired by machine translation, recent models tackle this…

Computer Vision and Pattern Recognition · Computer Science 2019-11-11 Silvio Olivastri , Gurkirt Singh , Fabio Cuzzolin

Neural Image Captioning

In recent years, the biggest advances in major Computer Vision tasks, such as object recognition, handwritten-digit identification, facial recognition, and many others., have all come through the use of Convolutional Neural Networks (CNNs).…

Computation and Language · Computer Science 2019-07-05 Elaina Tan , Lakshay Sharma

AutoCaption: Image Captioning with Neural Architecture Search

Image captioning transforms complex visual information into abstract natural language for representation, which can help computers understanding the world quickly. However, due to the complexity of the real environment, it needs to identify…

Computer Vision and Pattern Recognition · Computer Science 2021-11-19 Xinxin Zhu , Weining Wang , Longteng Guo , Jing Liu

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang