English
Related papers

Related papers: Learning to Evaluate Image Captioning

200 papers

Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption…

Computer Vision and Pattern Recognition · Computer Science 2020-09-10 Zeyu Wang , Berthy Feng , Karthik Narasimhan , Olga Russakovsky

Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system…

Artificial Intelligence · Computer Science 2020-12-25 Naeha Sharif , Lyndon White , Mohammed Bennamoun , Wei Liu , Syed Afaq Ali Shah

Image captioning has become an essential Vision & Language research task. It is about predicting the most accurate caption given a specific image or video. The research community has achieved impressive results by continuously proposing new…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Guillermo Ruiz , Tania Ramírez , Daniela Moctezuma

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge. Existing evaluation metrics like CIDEr or CLIP-Score fall short in this regard as they do not take into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above…

Computer Vision and Pattern Recognition · Computer Science 2019-05-16 Qingzhong Wang , Antoni B. Chan

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor…

Computer Vision and Pattern Recognition · Computer Science 2016-08-01 Peter Anderson , Basura Fernando , Mark Johnson , Stephen Gould

The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Chao Zeng , Sam Kwong

A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are…

Computer Vision and Pattern Recognition · Computer Science 2020-09-30 Jiuniu Wang , Wenjia Xu , Qingzhong Wang , Antoni B. Chan

The task of image captioning has recently been gaining popularity, and with it the complex task of evaluating the quality of image captioning models. In this work, we present the first survey and taxonomy of over 70 different image…

Computation and Language · Computer Science 2025-09-16 Uri Berger , Gabriel Stanovsky , Omri Abend , Lea Frermann

Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Jiuniu Wang , Wenjia Xu , Qingzhong Wang , Antoni B. Chan

Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation…

Computer Vision and Pattern Recognition · Computer Science 2018-03-14 Siqi Liu , Zhenhai Zhu , Ning Ye , Sergio Guadarrama , Kevin Murphy

Automatically evaluating the quality of image captions can be very challenging since human language is quite flexible that there can be various expressions for the same meaning. Most of the current captioning metrics rely on token level…

Computer Vision and Pattern Recognition · Computer Science 2021-06-30 Chao Zeng , Tiesong Zhao , Sam Kwong

Image captioning evaluation remains a significant challenge, as vision-language models evolve toward more challenging capabilities such as generating long-form and context-rich descriptions. State-of-the-art evaluation metrics involve…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Gonçalo Gomes , Bruno Martins , Chrysoula Zerva

Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and METEOR. However, such n-gram-based metrics have been shown to correlate poorly with human evaluation, leading to the proposal of alternative metrics such…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Yuiga Wada , Kanta Kaneda , Komei Sugiura

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is…

Computer Vision and Pattern Recognition · Computer Science 2015-06-04 Ramakrishna Vedantam , C. Lawrence Zitnick , Devi Parikh

Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Jia-Hong Huang , Hongyi Zhu , Yixian Shen , Stevan Rudinac , Alessio M. Pacces , Evangelos Kanoulas

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures. In this paper, we propose a new recipe for a…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Sara Sarto , Manuele Barraco , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions,…

Computation and Language · Computer Science 2019-09-06 Ming Jiang , Qiuyuan Huang , Lei Zhang , Xin Wang , Pengchuan Zhang , Zhe Gan , Jana Diesner , Jianfeng Gao

The evaluation of machine-generated image captions poses an interesting yet persistent challenge. Effective evaluation measures must consider numerous dimensions of similarity, including semantic relevance, visual structure, object…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 David Chan , Suzanne Petryk , Joseph E. Gonzalez , Trevor Darrell , John Canny

Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image…

Computer Vision and Pattern Recognition · Computer Science 2023-06-07 Eslam Mohamed Bakr , Pengzhan Sun , Li Erran Li , Mohamed Elhoseiny
‹ Prev 1 2 3 10 Next ›