Related papers: CIDEr: Consensus-based Image Description Evaluatio…

CIDEr-R: Robust Consensus-based Image Description Evaluation

This paper shows that CIDEr-D, a traditional evaluation metric for image description, does not work properly on datasets where the number of words in the sentence is significantly greater than those in the MS COCO Captions dataset. We also…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Gabriel Oliveira dos Santos , Esther Luna Colombini , Sandra Avila

Describing like humans: on diversity in image captioning

Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above…

Computer Vision and Pattern Recognition · Computer Science 2019-05-16 Qingzhong Wang , Antoni B. Chan

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions,…

Computation and Language · Computer Science 2019-09-06 Ming Jiang , Qiuyuan Huang , Lei Zhang , Xin Wang , Pengchuan Zhang , Zhe Gan , Jana Diesner , Jianfeng Gao

On Distinctive Image Captioning via Comparing and Reweighting

Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Jiuniu Wang , Wenjia Xu , Qingzhong Wang , Antoni B. Chan

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are…

Computer Vision and Pattern Recognition · Computer Science 2020-09-30 Jiuniu Wang , Wenjia Xu , Qingzhong Wang , Antoni B. Chan

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to…

Computer Vision and Pattern Recognition · Computer Science 2018-06-19 Yin Cui , Guandao Yang , Andreas Veit , Xun Huang , Serge Belongie

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning…

Computer Vision and Pattern Recognition · Computer Science 2021-10-07 Ali Furkan Biten , Andres Mafla , Lluis Gomez , Dimosthenis Karatzas

A Sanity Check on Composed Image Retrieval

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Yikun Liu , Jiangchao Yao , Weidi Xie , Yanfeng Wang

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge. Existing evaluation metrics like CIDEr or CLIP-Score fall short in this regard as they do not take into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

CLAIR: Evaluating Image Captions with Large Language Models

The evaluation of machine-generated image captions poses an interesting yet persistent challenge. Effective evaluation measures must consider numerous dimensions of similarity, including semantic relevance, visual structure, object…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 David Chan , Suzanne Petryk , Joseph E. Gonzalez , Trevor Darrell , John Canny

IC3: Image Captioning by Committee Consensus

If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 David M. Chan , Austin Myers , Sudheendra Vijayanarasimhan , David A. Ross , John Canny

VCRScore: Image captioning metric based on V\&L Transformers, CLIP, and precision-recall

Image captioning has become an essential Vision & Language research task. It is about predicting the most accurate caption given a specific image or video. The research community has achieved impressive results by continuously proposing new…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Guillermo Ruiz , Tania Ramírez , Daniela Moctezuma

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…

Computer Vision and Pattern Recognition · Computer Science 2020-11-12 Anubhav Shrimal , Tanmoy Chakraborty

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free manner in which humans assess caption…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Jack Hessel , Ari Holtzman , Maxwell Forbes , Ronan Le Bras , Yejin Choi

A Novel Evaluation Framework for Image2Text Generation

Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Jia-Hong Huang , Hongyi Zhu , Yixian Shen , Stevan Rudinac , Alessio M. Pacces , Evangelos Kanoulas

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a…

Computation and Language · Computer Science 2015-04-10 Rémi Lebret , Pedro O. Pinheiro , Ronan Collobert

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

Composed Image Retrieval (CIR) is a pivotal and complex task in multimodal understanding. Current CIR benchmarks typically feature limited query categories and fail to capture the diverse requirements of real-world scenarios. To bridge this…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Tingyu Song , Yanzhao Zhang , Mingxin Li , Zhuoning Guo , Dingkun Long , Pengjun Xie , Siyue Zhang , Yilun Zhao , Shu Wu

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Nicholas Moratelli , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and…

Computation and Language · Computer Science 2022-03-21 Mert İnan , Piyush Sharma , Baber Khalid , Radu Soricut , Matthew Stone , Malihe Alikhani

WEmbSim: A Simple yet Effective Metric for Image Captioning

The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly…

Computer Vision and Pattern Recognition · Computer Science 2020-12-25 Naeha Sharif , Lyndon White , Mohammed Bennamoun , Wei Liu , Syed Afaq Ali Shah