English
Related papers

Related papers: CapWAP: Captioning with a Purpose

200 papers

Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers. Meanwhile,…

Computation and Language · Computer Science 2018-05-23 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Jialin Wu , Zeyuan Hu , Raymond J. Mooney

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

The Controllable Image Captioning Agent (CapAgent) is an innovative system designed to bridge the gap between user simplicity and professional-level outputs in image captioning tasks. CapAgent automatically transforms user-provided simple…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Xinran Wang , Muxi Diao , Baoteng Li , Haiwen Zhang , Kongming Liang , Zhanyu Ma

Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Yushi Hu , Hang Hua , Zhengyuan Yang , Weijia Shi , Noah A Smith , Jiebo Luo

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context…

Computer Vision and Pattern Recognition · Computer Science 2020-04-09 Zhuowan Li , Quan Tran , Long Mai , Zhe Lin , Alan Yuille

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Qian Cao , Xu Chen , Ruihua Song , Xiting Wang , Xinting Huang , Yuchen Ren

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Charles Chen , Ruiyi Zhang , Eunyee Koh , Sungchul Kim , Scott Cohen , Tong Yu , Ryan Rossi , Razvan Bunescu

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and…

Computer Vision and Pattern Recognition · Computer Science 2019-01-28 Omid Mohamad Nezami , Mark Dras , Peter Anderson , Len Hamey

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Interactive search sessions often contain multiple queries, where the user submits a reformulated version of the previous query in response to the original results. We aim to enhance the query recommendation experience for a commercial…

Information Retrieval · Computer Science 2020-03-03 Gaurav Verma , Vishwa Vinay , Sahil Bansal , Shashank Oberoi , Makkunda Sharma , Prakhar Gupta

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial…

Computation and Language · Computer Science 2019-06-17 Hyounghun Kim , Mohit Bansal

Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Shi Chen , Qi Zhao

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in…

Computer Vision and Pattern Recognition · Computer Science 2019-06-07 Zheng-Jun Zha , Daqing Liu , Hanwang Zhang , Yongdong Zhang , Feng Wu

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Alessandro Nicolosi , Rita Cucchiara

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the…

Computer Vision and Pattern Recognition · Computer Science 2021-02-09 Eva Cetinic
‹ Prev 1 2 3 10 Next ›