Related papers: Object-Centric Unsupervised Image Captioning

Unsupervised Image Captioning

Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Yang Feng , Lin Ma , Wei Liu , Jiebo Luo

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings

Understanding images without explicit supervision has become an important problem in computer vision. In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images…

Computer Vision and Pattern Recognition · Computer Science 2019-08-27 Iro Laina , Christian Rupprecht , Nassir Navab

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler-Cinbis

UNISON: Unpaired Cross-lingual Image Captioning

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised…

Computation and Language · Computer Science 2022-02-08 Jiahui Gao , Yi Zhou , Philip L. H. Yu , Shafiq Joty , Jiuxiang Gu

Experimenting with Self-Supervision using Rotation Prediction for Image Captioning

Image captioning is a task in the field of Artificial Intelligence that merges between computer vision and natural language processing. It is responsible for generating legends that describe images, and has various applications like…

Computer Vision and Pattern Recognition · Computer Science 2021-07-29 Ahmed Elhagry , Karima Kadaoui

Unpaired Image Captioning by Language Pivoting

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a…

Computer Vision and Pattern Recognition · Computer Science 2018-07-19 Jiuxiang Gu , Shafiq Joty , Jianfei Cai , Gang Wang

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

The goal of unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase. Although challenging, we except the task can be accomplished by leveraging a training set of images aligned with…

Computer Vision and Pattern Recognition · Computer Science 2022-03-08 Peipei Zhu , Xiao Wang , Yong Luo , Zhenglong Sun , Wei-Shi Zheng , Yaowei Wang , Changwen Chen

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the…

Computation and Language · Computer Science 2021-06-02 Ukyo Honda , Yoshitaka Ushiku , Atsushi Hashimoto , Taro Watanabe , Yuji Matsumoto

Deep Learning Approaches on Image Captioning: A Review

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Learning Object Detection from Captions via Textual Scene Attributes

Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect, as annotators need to label objects and their bounding boxes. Thus, it is a significant challenge to use cheaper…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Achiya Jerbi , Roei Herzig , Jonathan Berant , Gal Chechik , Amir Globerson

Unsupervised Image Matching and Object Discovery as Optimization

Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an…

Computer Vision and Pattern Recognition · Computer Science 2019-04-08 Huy V. Vo , Francis Bach , Minsu Cho , Kai Han , Yann LeCun , Patrick Perez , Jean Ponce

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Vikram Mullachery , Vishal Motwani

A Comprehensive Survey of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Md. Zakir Hossain , Ferdous Sohel , Mohd Fairuz Shiratuddin , Hamid Laga

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In…

Computer Vision and Pattern Recognition · Computer Science 2023-01-27 Dong-Jin Kim , Tae-Hyun Oh , Jinsoo Choi , In So Kweon

Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

Constructing an organized dataset comprised of a large number of images and several captions for each image is a laborious task, which requires vast human effort. On the other hand, collecting a large number of images and sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Dong-Jin Kim , Jinsoo Choi , Tae-Hyun Oh , In So Kweon

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

Generating image descriptions in different languages is essential to satisfy users worldwide. However, it is prohibitively expensive to collect large-scale paired image-caption dataset for every target language which is critical for…

Computer Vision and Pattern Recognition · Computer Science 2019-08-16 Yuqing Song , Shizhe Chen , Yida Zhao , Qin Jin

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu