Related papers: Explicit Image Caption Editing

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism

Explicit Caption Editing (ECE) -- refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DETELE) -- has raised significant attention due to its explainable and human-like nature. After training with…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Zhen Wang , Xinyun Jiang , Jun Xiao , Tao Chen , Long Chen

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions,…

Computation and Language · Computer Science 2019-09-06 Ming Jiang , Qiuyuan Huang , Lei Zhang , Xin Wang , Pengchuan Zhang , Zhe Gan , Jana Diesner , Jianfeng Gao

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Rethinking the Reference-based Distinctive Image Captioning

Distinctive Image Captioning (DIC) -- generating distinctive captions that describe the unique details of a target image -- has received considerable attention over the last few years. A recent DIC work proposes to generate distinctive…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Yangjun Mao , Long Chen , Zhihong Jiang , Dong Zhang , Zhimeng Zhang , Jian Shao , Jun Xiao

Show, Edit and Tell: A Framework for Editing Image Captions

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when…

Computer Vision and Pattern Recognition · Computer Science 2020-03-09 Fawaz Sammani , Luke Melas-Kyriazi

An Ensemble Model with Attention Based Mechanism for Image Captioning

Image captioning creates informative text from an input image by creating a relationship between the words and the actual content of an image. Recently, deep learning models that utilize transformers have been the most successful in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Israa Al Badarneh , Bassam Hammo , Omar Al-Kadi

Face-Cap: Image Captioning using Facial Expression Analysis

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and…

Computer Vision and Pattern Recognition · Computer Science 2019-01-28 Omid Mohamad Nezami , Mark Dras , Peter Anderson , Len Hamey

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

Exploring Explicit and Implicit Visual Relationships for Image Captioning

Image captioning is one of the most challenging tasks in AI, which aims to automatically generate textual sentences for an image. Recent methods for image captioning follow encoder-decoder framework that transforms the sequence of salient…

Computer Vision and Pattern Recognition · Computer Science 2021-05-07 Zeliang Song , Xiaofei Zhou

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge. Existing evaluation metrics like CIDEr or CLIP-Score fall short in this regard as they do not take into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

Distinctive Image Captioning (DIC) -- generating distinctive captions that describe the unique details of a target image -- has received considerable attention over the last few years. A recent DIC method proposes to generate distinctive…

Computer Vision and Pattern Recognition · Computer Science 2023-06-27 Yangjun Mao , Jun Xiao , Dong Zhang , Meng Cao , Jian Shao , Yueting Zhuang , Long Chen

Modeling Image-Caption Rating from Comparative Judgments

Image caption rating is becoming increasingly important because computer-generated captions are used extensively for descriptive annotation. However, rating the accuracy of captions in describing images is time-consuming and subjective in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Kezia Minni , Qiang Zhang , Monoshiz Mahbub Khan , Zhe Yu

On Distinctive Image Captioning via Comparing and Reweighting

Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Jiuniu Wang , Wenjia Xu , Qingzhong Wang , Antoni B. Chan

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation

When generating images from prompts that include specific entities, the model must retain as much entity-specific knowledge as possible. However, the number of entities is almost countless, and new entities emerge; memorizing all of them…

Computation and Language · Computer Science 2026-04-21 Shintaro Ozaki , Tomoyuki Jinno , Kazuki Hayashi , Yusuke Sakai , Jingun Kwon , Hidetaka Kamigaito , Katsuhiko Hayashi , Manabu Okumura , Taro Watanabe

Guiding Image Captioning Models Toward More Specific Captions

Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not…

Computer Vision and Pattern Recognition · Computer Science 2023-08-01 Simon Kornblith , Lala Li , Zirui Wang , Thao Nguyen

In-Context Editing: Learning Knowledge from Self-Induced Distributions

In scenarios where language models must incorporate new information efficiently without extensive retraining, traditional fine-tuning methods are prone to overfitting, degraded generalization, and unnatural language generation. To address…

Computation and Language · Computer Science 2025-04-01 Siyuan Qi , Bangcheng Yang , Kailin Jiang , Xiaobo Wang , Jiaqi Li , Yifan Zhong , Yaodong Yang , Zilong Zheng

Pragmatic Issue-Sensitive Image Captioning

Image captioning systems have recently improved dramatically, but they still tend to produce captions that are insensitive to the communicative goals that captions should meet. To address this, we propose Issue-Sensitive Image Captioning…

Computation and Language · Computer Science 2020-10-07 Allen Nie , Reuben Cohn-Gordon , Christopher Potts

FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion

Fashion-image editing represents a challenging computer vision task, where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first…

Computer Vision and Pattern Recognition · Computer Science 2023-01-06 Martin Pernuš , Clinton Fookes , Vitomir Štruc , Simon Dobrišek

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Nicholas Moratelli , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara