Related papers: Image Captioning through Image Transformer

Image Captioning: Transforming Objects into Words

Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals…

Computer Vision and Pattern Recognition · Computer Science 2020-01-14 Simao Herdade , Armin Kappeler , Kofi Boakye , Joao Soares

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Multimodal Transformer with Multi-View Visual Representation for Image Captioning

Image captioning aims to automatically generate a natural language description of a given image, and most state-of-the-art models have adopted an encoder-decoder framework. The framework consists of a convolution neural network (CNN)-based…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Jun Yu , Jing Li , Zhou Yu , Qingming Huang

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation

Image captioning involves generating textual descriptions from input images, bridging the gap between computer vision and natural language processing. Recent advancements in transformer-based models have significantly improved caption…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Israa A. Albadarneh , Bassam H. Hammo , Omar S. Al-Kadi

Areas of Attention for Image Captioning

We propose "Areas of Attention", a novel attention-based model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise…

Computer Vision and Pattern Recognition · Computer Science 2017-08-28 Marco Pedersoli , Thomas Lucas , Cordelia Schmid , Jakob Verbeek

Image Captioning as Neural Machine Translation Task in SOCKEYE

Image captioning is an interdisciplinary research problem that stands between computer vision and natural language processing. The task is to generate a textual description of the content of an image. The typical model used for image…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Loris Bazzani , Tobias Domhan , Felix Hieber

Exploring Explicit and Implicit Visual Relationships for Image Captioning

Image captioning is one of the most challenging tasks in AI, which aims to automatically generate textual sentences for an image. Recent methods for image captioning follow encoder-decoder framework that transforms the sequence of salient…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Zeliang Song , Xiaofei Zhou

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Vikram Mullachery , Vishal Motwani

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…

Computer Vision and Pattern Recognition · Computer Science 2020-11-12 Anubhav Shrimal , Tanmoy Chakraborty

Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning

Automatic transcription of scene understanding in images and videos is a step towards artificial general intelligence. Image captioning is a nomenclature for describing meaningful information in an image using computer vision techniques.…

Computer Vision and Pattern Recognition · Computer Science 2021-09-17 Shikha Dubey , Farrukh Olimov , Muhammad Aasim Rafique , Joonmo Kim , Moongu Jeon

Image Captioning using Deep Neural Architectures

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about…

Computer Vision and Pattern Recognition · Computer Science 2018-10-03 Parth Shah , Vishvajit Bakarola , Supriya Pati

Iconographic Image Captioning for Artworks

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the…

Computer Vision and Pattern Recognition · Computer Science 2021-02-09 Eva Cetinic

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

Robust Image Captioning

Automated captioning of photos is a mission that incorporates the difficulties of photo analysis and text generation. One essential feature of captioning is the concept of attention: how to determine what to specify and in which sequence.…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Daniel Yarnell , Xian Wang

Describing and Localizing Multiple Changes with Transformers

Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single…

Computer Vision and Pattern Recognition · Computer Science 2021-09-16 Yue Qiu , Shintaro Yamamoto , Kodai Nakashima , Ryota Suzuki , Kenji Iwata , Hirokatsu Kataoka , Yutaka Satoh

ReFormer: The Relational Transformer for Image Captioning

Image captioning is shown to be able to achieve a better performance by using scene graphs to represent the relations of objects in the image. The current captioning encoders generally use a Graph Convolutional Net (GCN) to represent the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-18 Xuewen Yang , Yingru Liu , Xin Wang

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Yiyu Wang , Jungang Xu , Yingfei Sun , Ben He

Automated Image Captioning with CNNs and Transformers

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Joshua Adrian Cahyono , Jeremy Nathan Jusuf

Controllable Image Captioning

State-of-the-art image captioners can generate accurate sentences to describe images in a sequence to sequence manner without considering the controllability and interpretability. This, however, is far from making image captioning widely…

Computer Vision and Pattern Recognition · Computer Science 2022-05-26 Luka Maxwell