Related papers: Hyperparameter Analysis for Image Captioning
Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…
This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different…
Image captioning by the encoder-decoder framework has shown tremendous advancement in the last decade where CNN is mainly used as encoder and LSTM is used as a decoder. Despite such an impressive achievement in terms of accuracy in simple…
In a globalized world at the present epoch of generative intelligence, most of the manual labour tasks are automated with increased efficiency. This can support businesses to save time and money. A crucial component of generative…
Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite…
The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…
Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…
Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this…
Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the…
In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image…
Image captioning creates informative text from an input image by creating a relationship between the words and the actual content of an image. Recently, deep learning models that utilize transformers have been the most successful in…
Image captioning is a technology that produces text-based descriptions for an image. Deep learning-based solutions built on top of feature recognition may very well serve the purpose. But as with any other machine learning solution, the…
Change captioning is to describe the semantic change between a pair of similar images in natural language. It is more challenging than general image captioning, because it requires capturing fine-grained change information while being…
This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google's CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of…
CNN-LSTM based architectures have played an important role in image captioning, but limited by the training efficiency and expression ability, researchers began to explore the CNN-Transformer based models and achieved great success.…
In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input to Transformer. Compared to the…
The Convolutional Neural Network (CNN) has been the dominant image feature extractor in computer vision for years. However, it fails to get the relationship between images/objects and their hierarchical interactions which can be helpful for…
Language and vision are processed as two different modal in current work for image captioning. However, recent work on Super Characters method shows the effectiveness of two-dimensional word embedding, which converts text classification…
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic…
Image Captioning (IC) has achieved astonishing developments by incorporating various techniques into the CNN-RNN encoder-decoder architecture. However, since CNN and RNN do not share the basic network component, such a heterogeneous…