English
Related papers

Related papers: Microsoft COCO Captions: Data Collection and Evalu…

200 papers

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to…

Computer Vision and Pattern Recognition · Computer Science 2016-02-22 Hao Fang , Saurabh Gupta , Forrest Iandola , Rupesh Srivastava , Li Deng , Piotr Dollár , Jianfeng Gao , Xiaodong He , Margaret Mitchell , John C. Platt , C. Lawrence Zitnick , Geoffrey Zweig

The growth of deep learning (DL) relies heavily on huge amounts of labelled data for tasks such as natural language processing and computer vision. Specifically, in image-to-text or image-to-image pipelines, opinion (sentiment) may be…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Aleksei Krotov , Alison Tebo , Dylan K. Picart , Aaron Dean Algave

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most…

Computation and Language · Computer Science 2017-05-03 Yuya Yoshikawa , Yutaro Shigeto , Akikazu Takeuchi

Current captioning datasets focus on object-centric captions, describing the visible objects in the image, e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to…

Computation and Language · Computer Science 2023-09-26 Michele Cafagna , Kees van Deemter , Albert Gatt

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently…

Computer Vision and Pattern Recognition · Computer Science 2020-11-12 Anubhav Shrimal , Tanmoy Chakraborty

Image captioning is the process of generating a natural language description of an image. Most current image captioning models, however, do not take into account the emotional aspect of an image, which is very relevant to activities and…

Computer Vision and Pattern Recognition · Computer Science 2019-01-28 Omid Mohamad Nezami , Mark Dras , Peter Anderson , Len Hamey

Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective…

Computer Vision and Pattern Recognition · Computer Science 2020-04-16 Omid Mohamad Nezami , Mark Dras , Stephen Wan , Cecile Paris

Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of…

Computer Vision and Pattern Recognition · Computer Science 2020-12-23 Pierre Dognin , Igor Melnyk , Youssef Mroueh , Inkit Padhi , Mattia Rigotti , Jarret Ross , Yair Schiff

The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Chao Zeng , Sam Kwong

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a…

Computation and Language · Computer Science 2015-04-10 Rémi Lebret , Pedro O. Pinheiro , Ronan Collobert

Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our…

Machine Learning · Computer Science 2023-10-27 Thao Nguyen , Samir Yitzhak Gadre , Gabriel Ilharco , Sewoong Oh , Ludwig Schmidt

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., "a man playing a guitar"). While such tasks are useful to verify that a machine understands the content of an…

Computer Vision and Pattern Recognition · Computer Science 2019-03-21 Kurt Shuster , Samuel Humeau , Hexiang Hu , Antoine Bordes , Jason Weston

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor…

Computer Vision and Pattern Recognition · Computer Science 2016-08-01 Peter Anderson , Basura Fernando , Mark Johnson , Stephen Gould

Urdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-language research. The absence of large-scale, high-quality datasets has limited the development of Urdu-capable systems and reinforced biases…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Umair Hassan

Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild. In this paper, we focus…

Computation and Language · Computer Science 2021-06-03 Tomer Levinboim , Ashish V. Thapliyal , Piyush Sharma , Radu Soricut

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset…

Computation and Language · Computer Science 2023-05-02 Ahmed Sabir , Francesc Moreno-Noguer , Lluís Padró

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such…

Computer Vision and Pattern Recognition · Computer Science 2015-12-15 Alexander Mathews , Lexing Xie , Xuming He

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images.…

Computer Vision and Pattern Recognition · Computer Science 2018-02-23 Shikhar Sharma , Dendi Suhubdy , Vincent Michalski , Samira Ebrahimi Kahou , Yoshua Bengio

Image captioning is a computer vision task that involves generating natural language descriptions for images. This method has numerous applications in various domains, including image retrieval systems, medicine, and various industries.…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Sai Suprabhanu Nallapaneni , Subrahmanyam Konakanchi
‹ Prev 1 2 3 10 Next ›