English
Related papers

Related papers: Boosting Entity-aware Image Captioning with Multi-…

200 papers

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given…

Computation and Language · Computer Science 2018-11-08 Di Lu , Spencer Whitehead , Lifu Huang , Heng Ji , Shih-Fu Chang

News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to…

Computation and Language · Computer Science 2024-03-12 Ning Xu , Yanhui Wang , Tingting Zhang , Hongshuo Tian , Mohan Kankanhalli , An-An Liu

News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Current MLLMs still bear limitations in handling entity information in news image captioning…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Junzhe Zhang , Huixuan Zhang , Xunjian Yin , Xiaojun Wan

Most current image captioning systems focus on describing general image content, and lack background knowledge to deeply understand the image, such as exact named entities or concrete events. In this work, we focus on the entity-aware news…

Computer Vision and Pattern Recognition · Computer Science 2021-08-05 Anwen Hu , Shizhe Chen , Qin Jin

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Hammad A. Ayyubi , Tianqi Liu , Arsha Nagrani , Xudong Lin , Mingda Zhang , Anurag Arnab , Feng Han , Yukun Zhu , Jialu Liu , Shih-Fu Chang

Extracting structured knowledge from texts has traditionally been used for knowledge base generation. However, other sources of information, such as images can be leveraged into this process to build more complete and richer knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Ashutosh Tiwari , Sandeep Varma

Recently, fake news with text and images have achieved more effective diffusion than text-only fake news, raising a severe issue of multimodal fake news detection. Current studies on this issue have made significant contributions to…

Multimedia · Computer Science 2021-08-25 Peng Qi , Juan Cao , Xirong Li , Huan Liu , Qiang Sheng , Xiaoyue Mi , Qin He , Yongbiao Lv , Chenyang Guo , Yingchao Yu

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained…

Computation and Language · Computer Science 2019-06-24 Sanqiang Zhao , Piyush Sharma , Tomer Levinboim , Radu Soricut

Coherent entity-aware multi-image captioning aims to generate coherent captions for neighboring images in a news document. There are coherence relationships among neighboring images because they often describe same entities or events. These…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Jingqiang Chen

Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information, even to the extent of inventing plausible explanations when contextual information and images do not match. In…

Computer Vision and Pattern Recognition · Computer Science 2022-09-22 Khanh Nguyen , Ali Furkan Biten , Andres Mafla , Lluis Gomez , Dimosthenis Karatzas

Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Mathilde Caron , Alireza Fathi , Cordelia Schmid , Ahmet Iscen

News image captioning aims to produce journalistically informative descriptions by combining visual content with contextual cues from associated articles. Despite recent advances, existing methods struggle with three key challenges: (1)…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Xiaoxing You , Qiang Huang , Lingyu Li , Chi Zhang , Xiaopeng Liu , Min Zhang , Jun Yu

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-03 Ali Furkan Biten , Lluis Gomez , Marçal Rusiñol , Dimosthenis Karatzas

We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Fuxiao Liu , Yinghan Wang , Tianlu Wang , Vicente Ordonez

Recent years have witnessed the significant damage caused by various types of fake news. Although considerable effort has been applied to address this issue and much progress has been made on detecting fake news, most existing approaches…

Social and Information Networks · Computer Science 2021-08-11 Yi Han , Amila Silva , Ling Luo , Shanika Karunasekera , Christopher Leckie

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's…

Computation and Language · Computer Science 2021-09-21 Shuguang Chen , Gustavo Aguilar , Leonardo Neves , Thamar Solorio

Named entities are ubiquitous in text that naturally accompanies images, especially in domains such as news or Wikipedia articles. In previous work, named entities have been identified as a likely reason for low performance of image-text…

Computer Vision and Pattern Recognition · Computer Science 2023-04-27 Giacomo Nebbia , Adriana Kovashka

We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Alasdair Tran , Alexander Mathews , Lexing Xie

To enhance research on multimodal knowledge base and multimodal information processing, we propose a new task called multimodal entity tagging (MET) with a multimodal knowledge base (MKB). We also develop a dataset for the problem using an…

Information Retrieval · Computer Science 2022-07-29 Hao Peng , Hang Li , Lei Hou , Juanzi Li , Chao Qiao

The ability to reason with and integrate different sensory inputs is the foundation underpinning human intelligence and it is the reason for the growing interest in modelling multi-modal information within Knowledge Graphs. Multi-Modal…

Artificial Intelligence · Computer Science 2024-10-18 Gianluca Apriceno , Valentina Tamma , Tania Bailoni , Jacopo de Berardinis , Mauro Dragoni
‹ Prev 1 2 3 10 Next ›