Related papers: Image-Caption Encoding for Improving Zero-Shot Gen…

Guiding Image Captioning Models Toward More Specific Captions

Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not…

Computer Vision and Pattern Recognition · Computer Science 2023-08-01 Simon Kornblith , Lala Li , Zirui Wang , Thao Nguyen

A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Gabi Shalev , Gal-Lev Shalev , Joseph Keshet

Caption Generation on Scenes with Seen and Unseen Object Categories

Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Berkan Demirel , Ramazan Gokberk Cinbis

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler-Cinbis

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

The Solution for the CVPR2023 NICE Image Captioning Challenge

In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge. Different from the traditional image captioning datasets, this challenge includes a larger new variety of visual concepts from many…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Xiangyu Wu , Yi Gao , Hailiang Zhang , Yang Yang , Weili Guo , Jianfeng Lu

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

MeaCap: Memory-Augmented Zero-shot Image Captioning

Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Zequn Zeng , Yan Xie , Hao Zhang , Chiyu Chen , Zhengjue Wang , Bo Chen

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Yoad Tewel , Yoav Shalev , Idan Schwartz , Lior Wolf

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc.…

Computer Vision and Pattern Recognition · Computer Science 2020-02-25 Madhavan Seshadri , Malavika Srikanth , Mikhail Belov

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language…

Computer Vision and Pattern Recognition · Computer Science 2024-01-05 Longtian Qiu , Shan Ning , Xuming He

COVE: COntext and VEracity prediction for out-of-context images

Images taken out of their context are the most prevalent form of multimodal misinformation. Debunking them requires (1) providing the true context of the image and (2) checking the veracity of the image's caption. However, existing…

Computation and Language · Computer Science 2025-02-04 Jonathan Tonglet , Gabriel Thiem , Iryna Gurevych

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Efficient Modeling of Future Context for Image Captioning

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Zhengcong Fei , Junshi Huang , Xiaoming Wei , Xiaolin Wei

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing…

Computer Vision and Pattern Recognition · Computer Science 2018-03-15 Jiuxiang Gu , Jianfei Cai , Gang Wang , Tsuhan Chen

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

Recent advancements in image captioning have explored text-only training methods to overcome the limitations of paired image-text data. However, existing text-only training methods often overlook the modality gap between using text data…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Soeun Lee , Si-Woo Kim , Taewhan Kim , Dong-Jin Kim

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions

The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic captions and may omit semantically important image…

Computer Vision and Pattern Recognition · Computer Science 2023-11-17 Noam Rotstein , David Bensaid , Shaked Brody , Roy Ganz , Ron Kimmel

Fine-Grained Image Captioning with Global-Local Discriminative Objective

Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Jie Wu , Tianshui Chen , Hefeng Wu , Zhi Yang , Guangchun Luo , Liang Lin

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We…

Computer Vision and Pattern Recognition · Computer Science 2017-07-21 Peter Anderson , Basura Fernando , Mark Johnson , Stephen Gould