Related papers: Boosting Entity-aware Image Captioning with Multi-…

Entity-aware Image Caption Generation

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given…

Computation and Language · Computer Science 2018-11-08 Di Lu , Spencer Whitehead , Lifu Huang , Heng Ji , Shih-Fu Chang

How to Understand Named Entities: Using Common Sense for News Captioning

News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to…

Computation and Language · Computer Science 2024-03-12 Ning Xu , Yanhui Wang , Tingting Zhang , Hongshuo Tian , Mohan Kankanhalli , An-An Liu

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning

News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Current MLLMs still bear limitations in handling entity information in news image captioning…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Junzhe Zhang , Huixuan Zhang , Xunjian Yin , Xiaojun Wan

ICECAP: Information Concentrated Entity-aware Image Captioning

Most current image captioning systems focus on describing general image content, and lack background knowledge to deeply understand the image, such as exact named entities or concrete events. In this work, we focus on the entity-aware news…

Computer Vision and Pattern Recognition · Computer Science 2021-08-05 Anwen Hu , Shizhe Chen , Qin Jin

Video Summarization: Towards Entity-Aware Captions

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Hammad A. Ayyubi , Tianqi Liu , Arsha Nagrani , Xudong Lin , Mingda Zhang , Anurag Arnab , Feng Han , Yukun Zhu , Jialu Liu , Shih-Fu Chang

Learning semantic Image attributes using Image recognition and knowledge graph embeddings

Extracting structured knowledge from texts has traditionally been used for knowledge base generation. However, other sources of information, such as images can be leveraged into this process to build more complete and richer knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Ashutosh Tiwari , Sandeep Varma

Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues

Recently, fake news with text and images have achieved more effective diffusion than text-only fake news, raising a severe issue of multimodal fake news detection. Current studies on this issue have made significant contributions to…

Multimedia · Computer Science 2021-08-25 Peng Qi , Juan Cao , Xirong Li , Huan Liu , Qiang Sheng , Xiaoyue Mi , Qin He , Yongbiao Lv , Chenyang Guo , Yingchao Yu

Informative Image Captioning with External Sources of Information

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained…

Computation and Language · Computer Science 2019-06-24 Sanqiang Zhao , Piyush Sharma , Tomer Levinboim , Radu Soricut

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Coherent entity-aware multi-image captioning aims to generate coherent captions for neighboring images in a news document. There are coherence relationships among neighboring images because they often describe same entities or events. These…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Jingqiang Chen

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information, even to the extent of inventing plausible explanations when contextual information and images do not match. In…

Computer Vision and Pattern Recognition · Computer Science 2022-09-22 Khanh Nguyen , Ali Furkan Biten , Andres Mafla , Lluis Gomez , Dimosthenis Karatzas

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Mathilde Caron , Alireza Fathi , Cordelia Schmid , Ahmet Iscen

Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning

News image captioning aims to produce journalistically informative descriptions by combining visual content with contextual cues from associated articles. Despite recent advances, existing methods struggle with three key challenges: (1)…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Xiaoxing You , Qiang Huang , Lingyu Li , Chi Zhang , Xiaopeng Liu , Min Zhang , Jun Yu

Good News, Everyone! Context driven entity-aware captioning for news images

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-03 Ali Furkan Biten , Lluis Gomez , Marçal Rusiñol , Dimosthenis Karatzas

Visual News: Benchmark and Challenges in News Image Captioning

We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Fuxiao Liu , Yinghan Wang , Tianlu Wang , Vicente Ordonez

Knowledge Enhanced Multi-modal Fake News Detection

Recent years have witnessed the significant damage caused by various types of fake news. Although considerable effort has been applied to address this issue and much progress has been made on detecting fake news, most existing approaches…

Social and Information Networks · Computer Science 2021-08-11 Yi Han , Amila Silva , Ling Luo , Shanika Karunasekera , Christopher Leckie

Can images help recognize entities? A study of the role of images for Multimodal NER

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's…

Computation and Language · Computer Science 2021-09-21 Shuguang Chen , Gustavo Aguilar , Leonardo Neves , Thamar Solorio

Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining

Named entities are ubiquitous in text that naturally accompanies images, especially in domains such as news or Wikipedia articles. In previous work, named entities have been identified as a likely reason for low performance of image-text…

Computer Vision and Pattern Recognition · Computer Science 2023-04-27 Giacomo Nebbia , Adriana Kovashka

Transform and Tell: Entity-Aware News Image Captioning

We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Alasdair Tran , Alexander Mathews , Lexing Xie

Multimodal Entity Tagging with Multimodal Knowledge Base

To enhance research on multimodal knowledge base and multimodal information processing, we propose a new task called multimodal entity tagging (MET) with a multimodal knowledge base (MKB). We also develop a dataset for the problem using an…

Information Retrieval · Computer Science 2022-07-29 Hao Peng , Hang Li , Lei Hou , Juanzi Li , Chao Qiao

A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities

The ability to reason with and integrate different sensory inputs is the foundation underpinning human intelligence and it is the reason for the growing interest in modelling multi-modal information within Knowledge Graphs. Multi-Modal…

Artificial Intelligence · Computer Science 2024-10-18 Gianluca Apriceno , Valentina Tamma , Tania Bailoni , Jacopo de Berardinis , Mauro Dragoni