Related papers: Hyperparameter Analysis for Image Captioning

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

Automated Image Captioning with CNNs and Transformers

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Joshua Adrian Cahyono , Jeremy Nathan Jusuf

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple…

Computer Vision and Pattern Recognition · Computer Science 2023-01-09 Rana Adnan Ahmad , Muhammad Azhar , Hina Sattar

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

In a globalized world at the present epoch of generative intelligence, most of the manual labour tasks are automated with increased efficiency. This can support businesses to save time and money. A crucial component of generative…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Pranav Dandwate , Chaitanya Shahane , Vandana Jagtap , Shridevi C. Karande

Efficient CNN-LSTM based Image Captioning using Neural Network Compression

Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Harshit Rampal , Aman Mohanty

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…

Computer Vision and Pattern Recognition · Computer Science 2020-11-12 Anubhav Shrimal , Tanmoy Chakraborty

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Aditya Bhattacharya , Eshwar Shamanna Girishekar , Padmakar Anil Deshpande

Image Captioning through Image Transformer

Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Sen He , Wentong Liao , Hamed R. Tavakoli , Michael Yang , Bodo Rosenhahn , Nicolas Pugeault

Compressed Image Captioning using CNN-based Encoder-Decoder Framework

In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Md Alif Rahman Ridoy , M Mahmud Hasan , Shovon Bhowmick

An Ensemble Model with Attention Based Mechanism for Image Captioning

Image captioning creates informative text from an input image by creating a relationship between the words and the actual content of an image. Recently, deep learning models that utilize transformers have been the most successful in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Israa Al Badarneh , Bassam Hammo , Omar Al-Kadi

Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention

Image captioning is a technology that produces text-based descriptions for an image. Deep learning-based solutions built on top of feature recognition may very well serve the purpose. But as with any other machine learning solution, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Rishi Kesav Mohan , Sanjay Sureshkumar , Vignesh Sivasubramaniam

Neighborhood Contrastive Transformer for Change Captioning

Change captioning is to describe the semantic change between a pair of similar images in natural language. It is more challenging than general image captioning, because it requires capturing fine-grained change information while being…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Yunbin Tu , Liang Li , Li Su , Ke Lu , Qingming Huang

Improving Image Captioning by Concept-based Sentence Reranking

This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google's CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of…

Computer Vision and Pattern Recognition · Computer Science 2016-05-04 Xirong Li , Qin Jin

End-to-End Transformer Based Model for Image Captioning

CNN-LSTM based architectures have played an important role in image captioning, but limited by the training efficiency and expression ability, researchers began to explore the CNN-Transformer based models and achieved great success.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Yiyu Wang , Jungang Xu , Yingfei Sun

CPTR: Full Transformer Network for Image Captioning

In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input to Transformer. Compared to the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-29 Wei Liu , Sihan Chen , Longteng Guo , Xinxin Zhu , Jing Liu

Better Understanding Hierarchical Visual Relationship for Image Caption

The Convolutional Neural Network (CNN) has been the dominant image feature extractor in computer vision for years. However, it fails to get the relationship between images/objects and their hierarchical interactions which can be helpful for…

Computer Vision and Pattern Recognition · Computer Science 2019-12-05 Zheng-cong Fei

SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding

Language and vision are processed as two different modal in current work for image captioning. However, recent work on Super Characters method shows the effectiveness of two-dimensional word embedding, which converts text classification…

Computation and Language · Computer Science 2019-06-05 Baohua Sun , Lin Yang , Michael Lin , Charles Young , Patrick Dong , Wenhan Zhang , Jason Dong

Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic…

Computer Vision and Pattern Recognition · Computer Science 2015-09-22 Quanzeng You , Jiebo Luo , Hailin Jin , Jianchao Yang

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorporating various techniques into the CNN-RNN encoder-decoder architecture. However, since CNN and RNN do not share the basic network component, such a heterogeneous…

Computer Vision and Pattern Recognition · Computer Science 2022-04-18 Yang Xu , Li Li , Haiyang Xu , Songfang Huang , Fei Huang , Jianfei Cai