Related papers: CapWAP: Captioning with a Purpose

Joint Image Captioning and Question Answering

Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers. Meanwhile,…

Computation and Language · Computer Science 2018-05-23 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

Generating Question Relevant Captions to Aid Visual Question Answering

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

From Simple to Professional: A Combinatorial Controllable Image Captioning Agent

The Controllable Image Captioning Agent (CapAgent) is an innovative system designed to bridge the gap between user simplicity and professional-level outputs in image captioning tasks. CapAgent automatically transforms user-provided simple…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Xinran Wang , Muxi Diao , Baoteng Li , Haiwen Zhang , Kongming Liang , Zhanyu Ma

PromptCap: Prompt-Guided Task-Aware Image Captioning

Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Yushi Hu , Hang Hua , Zhengyuan Yang , Weijia Shi , Noah A Smith , Jiebo Luo

Context-Aware Group Captioning via Self-Attention and Contrastive Features

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context…

Computer Vision and Pattern Recognition · Computer Science 2020-04-09 Zhuowan Li , Quan Tran , Long Mai , Zhe Lin , Alan Yuille

ContCap: A scalable framework for continual image captioning

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

See or Guess: Counterfactually Regularized Image Captioning

Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Qian Cao , Xu Chen , Ruihua Song , Xiting Wang , Xinting Huang , Yuchen Ren

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Figure Captioning with Reasoning and Sequence-Level Training

Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Charles Chen , Ruiyi Zhang , Eunyee Koh , Sungchul Kim , Scott Cohen , Tong Yu , Ryan Rossi , Razvan Bunescu

Face-Cap: Image Captioning using Facial Expression Analysis

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and…

Computer Vision and Pattern Recognition · Computer Science 2019-01-28 Omid Mohamad Nezami , Mark Dras , Peter Anderson , Len Hamey

Understanding Guided Image Captioning Performance across Domains

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Using Image Captions and Multitask Learning for Recommending Query Reformulations

Interactive search sessions often contain multiple queries, where the user submits a reformulated version of the previous query in response to the original results. We aim to enhance the query recommendation experience for a commercial…

Information Retrieval · Computer Science 2020-03-03 Gaurav Verma , Vishwa Vinay , Sahil Bansal , Shashank Oberoi , Makkunda Sharma , Prakhar Gupta

Generating Diverse and Meaningful Captions

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial…

Computation and Language · Computer Science 2019-06-17 Hyounghun Kim , Mohit Bansal

Boosted Attention: Leveraging Human Attention for Image Captioning

Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Shi Chen , Qi Zhao

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in…

Computer Vision and Pattern Recognition · Computer Science 2019-06-07 Zheng-Jun Zha , Daqing Liu , Hanwang Zhang , Yongdong Zhang , Feng Wu

Towards Retrieval-Augmented Architectures for Image Captioning

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Alessandro Nicolosi , Rita Cucchiara

Iconographic Image Captioning for Artworks

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the…

Computer Vision and Pattern Recognition · Computer Science 2021-02-09 Eva Cetinic