Related papers: Microsoft COCO Captions: Data Collection and Evalu…

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to…

Computer Vision and Pattern Recognition · Computer Science 2016-02-22 Hao Fang , Saurabh Gupta , Forrest Iandola , Rupesh Srivastava , Li Deng , Piotr Dollár , Jianfeng Gao , Xiaodong He , Margaret Mitchell , John C. Platt , C. Lawrence Zitnick , Geoffrey Zweig

Evaluating authenticity and quality of image captions via sentiment and semantic analyses

The growth of deep learning (DL) relies heavily on huge amounts of labelled data for tasks such as natural language processing and computer vision. Specifically, in image-to-text or image-to-image pipelines, opinion (sentiment) may be…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Aleksei Krotov , Alison Tebo , Dylan K. Picart , Aaron Dean Algave

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most…

Computation and Language · Computer Science 2017-05-03 Yuya Yoshikawa , Yutaro Shigeto , Akikazu Takeuchi

HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales

Current captioning datasets focus on object-centric captions, describing the visible objects in the image, e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to…

Computation and Language · Computer Science 2023-09-26 Michele Cafagna , Kees van Deemter , Albert Gatt

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…

Computer Vision and Pattern Recognition · Computer Science 2020-11-12 Anubhav Shrimal , Tanmoy Chakraborty

Face-Cap: Image Captioning using Facial Expression Analysis

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and…

Computer Vision and Pattern Recognition · Computer Science 2019-01-28 Omid Mohamad Nezami , Mark Dras , Peter Anderson , Len Hamey

Image Captioning using Facial Expression and Attention

Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective…

Computer Vision and Pattern Recognition · Computer Science 2020-04-16 Omid Mohamad Nezami , Mark Dras , Stephen Wan , Cecile Paris

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of…

Computer Vision and Pattern Recognition · Computer Science 2020-12-23 Pierre Dognin , Igor Melnyk , Youssef Mroueh , Inkit Padhi , Mattia Rigotti , Jarret Ross , Yair Schiff

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Chao Zeng , Sam Kwong

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a…

Computation and Language · Computer Science 2015-04-10 Rémi Lebret , Pedro O. Pinheiro , Ronan Collobert

Improving Multimodal Datasets with Image Captioning

Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our…

Machine Learning · Computer Science 2023-10-27 Thao Nguyen , Samir Yitzhak Gadre , Gabriel Ilharco , Sewoong Oh , Ludwig Schmidt

Engaging Image Captioning Via Personality

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., "a man playing a guitar"). While such tasks are useful to verify that a machine understands the content of an…

Computer Vision and Pattern Recognition · Computer Science 2019-03-21 Kurt Shuster , Samuel Humeau , Hexiang Hu , Antoine Bordes , Jason Weston

SPICE: Semantic Propositional Image Caption Evaluation

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor…

Computer Vision and Pattern Recognition · Computer Science 2016-08-01 Peter Anderson , Basura Fernando , Mark Johnson , Stephen Gould

COCO-Urdu: A Large-Scale Urdu Image-Caption Dataset with Multimodal Quality Estimation

Urdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-language research. The absence of large-scale, high-quality datasets has limited the development of Urdu-capable systems and reinforced biases…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Umair Hassan

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild. In this paper, we focus…

Computation and Language · Computer Science 2021-06-03 Tomer Levinboim , Ashish V. Thapliyal , Piyush Sharma , Radu Soricut

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Visual Semantic Relatedness Dataset for Image Captioning

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset…

Computation and Language · Computer Science 2023-05-02 Ahmed Sabir , Francesc Moreno-Noguer , Lluís Padró

SentiCap: Generating Image Descriptions with Sentiments

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such…

Computer Vision and Pattern Recognition · Computer Science 2015-12-15 Alexander Mathews , Lexing Xie , Xuming He

ChatPainter: Improving Text to Image Generation using Dialogue

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images.…

Computer Vision and Pattern Recognition · Computer Science 2018-02-23 Shikhar Sharma , Dendi Suhubdy , Vincent Michalski , Samira Ebrahimi Kahou , Yoshua Bengio

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Image captioning is a computer vision task that involves generating natural language descriptions for images. This method has numerous applications in various domains, including image retrieval systems, medicine, and various industries.…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Sai Suprabhanu Nallapaneni , Subrahmanyam Konakanchi