Related papers: Actor-Critic Sequence Training for Image Captionin…

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of…

Machine Learning · Computer Science 2017-11-17 Steven J. Rennie , Etienne Marcheret , Youssef Mroueh , Jarret Ross , Vaibhava Goel

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

The conventional training approach for image captioning involves pre-training a network using teacher forcing and subsequent fine-tuning with Self-Critical Sequence Training to maximize hand-crafted captioning metrics. However, when…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Nicholas Moratelli , Davide Caffagni , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Injecting Prior Knowledge into Image Caption Generation

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The…

Computation and Language · Computer Science 2020-08-07 Arushi Goel , Basura Fernando , Thanh-Son Nguyen , Hakan Bilen

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Self-critical n-step Training for Image Captioning

Existing methods for image captioning are usually trained by cross entropy loss, which leads to exposure bias and the inconsistency between the optimizing function and evaluation metrics. Recently it has been shown that these two issues can…

Computer Vision and Pattern Recognition · Computer Science 2019-04-16 Junlong Gao , Shiqi Wang , Shanshe Wang , Siwei Ma , Wen Gao

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning

We propose SC-Captioner, a reinforcement learning framework that enables the self-correcting capability of image caption models. Our crucial technique lies in the design of the reward function to incentivize accurate caption corrections.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-11 Lin Zhang , Xianfang Zeng , Kangcong Li , Gang Yu , Tao Chen

Improving Image Captioning with Control Signal of Sentence Quality

In the dataset of image captioning, each image is aligned with several descriptions. Despite the fact that the quality of these descriptions varies, existing captioning models treat them equally in the training process. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Zhangzi Zhu , Hong Qu

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data,…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Ruchika Chavhan , Biplab Banerjee , Xiao Xiang Zhu , Subhasis Chaudhuri

A Weighted Multi-Criteria Decision Making Approach for Image Captioning

Image captioning aims at automatically generating descriptions of an image in natural language. This is a challenging problem in the field of artificial intelligence that has recently received significant attention in the computer vision…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Hassan Maleki Galandouz , Mohsen Ebrahimi Moghaddam , Mehrnoush Shamsfard

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In…

Computer Vision and Pattern Recognition · Computer Science 2019-11-25 Paul Hongsuck Seo , Piyush Sharma , Tomer Levinboim , Bohyung Han , Radu Soricut

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Nicholas Moratelli , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images. It is a two-fold process relying on accurate image understanding and correct language…

Computer Vision and Pattern Recognition · Computer Science 2021-07-29 Ahmed Elhagry , Karima Kadaoui

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing…

Computer Vision and Pattern Recognition · Computer Science 2018-03-15 Jiuxiang Gu , Jianfei Cai , Gang Wang , Tsuhan Chen

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Yiyu Wang , Jungang Xu , Yingfei Sun , Ben He