English
Related papers

Related papers: Image-Caption Encoding for Improving Zero-Shot Gen…

200 papers

Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not…

Computer Vision and Pattern Recognition · Computer Science 2023-08-01 Simon Kornblith , Lala Li , Zirui Wang , Thao Nguyen

Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Gabi Shalev , Gal-Lev Shalev , Joseph Keshet

Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Berkan Demirel , Ramazan Gokberk Cinbis

Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler-Cinbis

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge. Different from the traditional image captioning datasets, this challenge includes a larger new variety of visual concepts from many…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Xiangyu Wu , Yi Gao , Hailiang Zhang , Yang Yang , Weili Guo , Jianfeng Lu

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Zequn Zeng , Yan Xie , Hao Zhang , Chiyu Chen , Zhengjue Wang , Bo Chen

Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Yoad Tewel , Yoav Shalev , Idan Schwartz , Lior Wolf

Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc.…

Computer Vision and Pattern Recognition · Computer Science 2020-02-25 Madhavan Seshadri , Malavika Srikanth , Mikhail Belov

Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language…

Computer Vision and Pattern Recognition · Computer Science 2024-01-05 Longtian Qiu , Shan Ning , Xuming He

Images taken out of their context are the most prevalent form of multimodal misinformation. Debunking them requires (1) providing the true context of the image and (2) checking the veracity of the image's caption. However, existing…

Computation and Language · Computer Science 2025-02-04 Jonathan Tonglet , Gabriel Thiem , Iryna Gurevych

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Zhengcong Fei , Junshi Huang , Xiaoming Wei , Xiaolin Wei

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing…

Computer Vision and Pattern Recognition · Computer Science 2018-03-15 Jiuxiang Gu , Jianfei Cai , Gang Wang , Tsuhan Chen

Recent advancements in image captioning have explored text-only training methods to overcome the limitations of paired image-text data. However, existing text-only training methods often overlook the modality gap between using text data…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Soeun Lee , Si-Woo Kim , Taewhan Kim , Dong-Jin Kim

The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic captions and may omit semantically important image…

Computer Vision and Pattern Recognition · Computer Science 2023-11-17 Noam Rotstein , David Bensaid , Shaked Brody , Roy Ganz , Ron Kimmel

Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Jie Wu , Tianshui Chen , Hefeng Wu , Zhi Yang , Guangchun Luo , Liang Lin

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We…

Computer Vision and Pattern Recognition · Computer Science 2017-07-21 Peter Anderson , Basura Fernando , Mark Johnson , Stephen Gould
‹ Prev 1 2 3 10 Next ›