Related papers: Controllable Image Captioning

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As an image can be described in infinite ways depending on the goal and the context at…

Computer Vision and Pattern Recognition · Computer Science 2019-05-10 Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Controllable Image Captioning via Prompting

Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e.g., describing the image in a rough or detailed manner, in a factual or emotional…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Ning Wang , Jiahao Xie , Jihao Wu , Mingbo Jia , Linlin Li

Image Captioning through Image Transformer

Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Sen He , Wentong Liao , Hamed R. Tavakoli , Michael Yang , Bodo Rosenhahn , Nicolas Pugeault

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Vikram Mullachery , Vishal Motwani

Comprehensive Image Captioning via Scene Graph Decomposition

We address the challenging problem of image captioning by revisiting the representation of image scene graph. At the core of our method lies the decomposition of a scene graph into a set of sub-graphs, with each sub-graph capturing a…

Computer Vision and Pattern Recognition · Computer Science 2020-07-24 Yiwu Zhong , Liwei Wang , Jianshu Chen , Dong Yu , Yin Li

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning

An image captioning model flexibly switching its language pattern, e.g., descriptiveness and length, should be useful since it can be applied to diverse applications. However, despite the dramatic improvement in generative vision-language…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Kuniaki Saito , Donghyun Kim , Kwanyong Park , Atsushi Hashimoto , Yoshitaka Ushiku

Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech

Image captioning is an ambiguous problem, with many suitable captions for an image. To address ambiguity, beam search is the de facto method for sampling multiple captions. However, beam search is computationally expensive and known to…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Aditya Deshpande , Jyoti Aneja , Liwei Wang , Alexander Schwing , D. A. Forsyth

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a…

Computation and Language · Computer Science 2015-08-11 Jack Hessel , Nicolas Savva , Michael J. Wilber

ReFormer: The Relational Transformer for Image Captioning

Image captioning is shown to be able to achieve a better performance by using scene graphs to represent the relations of objects in the image. The current captioning encoders generally use a Graph Convolutional Net (GCN) to represent the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-18 Xuewen Yang , Yingru Liu , Xin Wang

Macroscopic Control of Text Generation for Image Captioning

Despite the fact that image captioning models have been able to generate impressive descriptions for a given image, challenges remain: (1) the controllability and diversity of existing models are still far from satisfactory; (2) models…

Computer Vision and Pattern Recognition · Computer Science 2021-01-21 Zhangzi Zhu , Tianlei Wang , Hong Qu

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure…

Computer Vision and Pattern Recognition · Computer Science 2018-07-24 Xihui Liu , Hongsheng Li , Jing Shao , Dapeng Chen , Xiaogang Wang

Unsupervised Image Captioning

Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Yang Feng , Lin Ma , Wei Liu , Jiebo Luo

Informative Image Captioning with External Sources of Information

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained…

Computation and Language · Computer Science 2019-06-24 Sanqiang Zhao , Piyush Sharma , Tomer Levinboim , Radu Soricut

Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation

Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a sin- gle language, requiring multiple language-specific models to build a…

Computer Vision and Pattern Recognition · Computer Science 2017-06-21 Satoshi Tsutsui , David Crandall

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

A creative image-and-text generative AI system mimics humans' extraordinary abilities to provide users with diverse and comprehensive caption suggestions, as well as rich image creations. In this work, we demonstrate such an AI creation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-20 Yupan Huang , Bei Liu , Jianlong Fu , Yutong Lu

Image Captioning with Compositional Neural Module Networks

In image captioning where fluency is an important factor in evaluation, e.g., $n$-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Junjiao Tian , Jean Oh