Related papers: Image Captioning with Unseen Objects

Caption Generation on Scenes with Seen and Unseen Object Categories

Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Berkan Demirel , Ramazan Gokberk Cinbis

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Object-Centric Unsupervised Image Captioning

Image captioning is a longstanding problem in the field of computer vision and natural language processing. To date, researchers have produced impressive state-of-the-art performance in the age of deep learning. Most of these…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Zihang Meng , David Yang , Xuefei Cao , Ashish Shah , Ser-Nam Lim

Decoupled Novel Object Captioner

Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model…

Computer Vision and Pattern Recognition · Computer Science 2018-08-14 Yu Wu , Linchao Zhu , Lu Jiang , Yi Yang

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in…

Computer Vision and Pattern Recognition · Computer Science 2021-06-04 Marco Cagrandi , Marcella Cornia , Matteo Stefanini , Lorenzo Baraldi , Rita Cucchiara

Unsupervised Image Captioning

Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Yang Feng , Lin Ma , Wei Liu , Jiebo Luo

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Yoad Tewel , Yoav Shalev , Idan Schwartz , Lior Wolf

Intention Oriented Image Captions with Guiding Objects

Although existing image caption models can produce promising results using recurrent neural networks (RNNs), it is difficult to guarantee that an object we care about is contained in generated descriptions, for example in the case that the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Yue Zheng , Yali Li , Shengjin Wang

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

A Generative Model For Zero Shot Learning Using Conditional Variational Autoencoders

Zero shot learning in Image Classification refers to the setting where images from some novel classes are absent in the training data but other information such as natural language descriptions or attribute vectors of the classes are…

Computer Vision and Pattern Recognition · Computer Science 2018-01-30 Ashish Mishra , M Shiva Krishna Reddy , Anurag Mittal , Hema A Murthy

Zero Shot Hashing

This paper provides a framework to hash images containing instances of unknown object classes. In many object recognition problems, we might have access to huge amount of data. It may so happen that even this huge data doesn't cover the…

Computer Vision and Pattern Recognition · Computer Science 2016-10-11 Shubham Pachori , Shanmuganathan Raman

Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning

Leveraging class semantic descriptions and examples of known objects, zero-shot learning makes it possible to train a recognition model for an object class whose examples are not available. In this paper, we propose a novel zero-shot…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Soravit Changpinyo , Wei-Lun Chao , Fei Sha

Learning Object Detection from Captions via Textual Scene Attributes

Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect, as annotators need to label objects and their bounding boxes. Thus, it is a significant challenge to use cheaper…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Achiya Jerbi , Roei Herzig , Jonathan Berant , Gal Chechik , Amir Globerson

Generating Images from Captions with Attention

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the…

Machine Learning · Computer Science 2016-03-01 Elman Mansimov , Emilio Parisotto , Jimmy Lei Ba , Ruslan Salakhutdinov

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Synthesizing the Unseen for Zero-shot Object Detection

The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference. However, since the unseen objects are never…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Nasir Hayat , Munawar Hayat , Shafin Rahman , Salman Khan , Syed Waqas Zamir , Fahad Shahbaz Khan

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2016-04-29 Lisa Anne Hendricks , Subhashini Venugopalan , Marcus Rohrbach , Raymond Mooney , Kate Saenko , Trevor Darrell

Dense Captioning with Joint Inference and Visual Context

Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images,…

Computer Vision and Pattern Recognition · Computer Science 2017-08-09 Linjie Yang , Kevin Tang , Jianchao Yang , Li-Jia Li

Captioning Images with Diverse Objects

Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number…

Computer Vision and Pattern Recognition · Computer Science 2017-07-24 Subhashini Venugopalan , Lisa Anne Hendricks , Marcus Rohrbach , Raymond Mooney , Trevor Darrell , Kate Saenko

Zero-Shot Object Recognition System based on Topic Model

Object recognition systems usually require fully complete manually labeled training data to train the classifier. In this paper, we study the problem of object recognition where the training samples are missing during the classifier…

Computer Vision and Pattern Recognition · Computer Science 2014-10-15 Wai Lam Hoo , Chee Seng Chan