Related papers: Generating captions without looking beyond objects

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Image Captioning using Deep Neural Architectures

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about…

Computer Vision and Pattern Recognition · Computer Science 2018-10-03 Parth Shah , Vishvajit Bakarola , Supriya Pati

Caption Generation on Scenes with Seen and Unseen Object Categories

Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Berkan Demirel , Ramazan Gokberk Cinbis

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation

Image captioning involves generating textual descriptions from input images, bridging the gap between computer vision and natural language processing. Recent advancements in transformer-based models have significantly improved caption…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Israa A. Albadarneh , Bassam H. Hammo , Omar S. Al-Kadi

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

This paper focuses on enhancing the captions generated by image-caption generation systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely…

Computation and Language · Computer Science 2023-07-10 Ahmed Sabir

Image Captioning using Facial Expression and Attention

Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective…

Computer Vision and Pattern Recognition · Computer Science 2020-04-16 Omid Mohamad Nezami , Mark Dras , Stephen Wan , Cecile Paris

Generating Diverse and Meaningful Captions

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher

Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Philipp Harzig , Dan Zecha , Rainer Lienhart , Carolin Kaiser , René Schallner

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Vikram Mullachery , Vishal Motwani

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a…

Computation and Language · Computer Science 2015-08-11 Jack Hessel , Nicolas Savva , Michael J. Wilber

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping…

Computer Vision and Pattern Recognition · Computer Science 2020-11-03 Feicheng Huang , Zhixin Li , Haiyang Wei , Canlong Zhang , Huifang Ma

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to…

Computer Vision and Pattern Recognition · Computer Science 2016-02-22 Hao Fang , Saurabh Gupta , Forrest Iandola , Rupesh Srivastava , Li Deng , Piotr Dollár , Jianfeng Gao , Xiaodong He , Margaret Mitchell , John C. Platt , C. Lawrence Zitnick , Geoffrey Zweig

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image caption system…

Computer Vision and Pattern Recognition · Computer Science 2015-06-23 Junqi Jin , Kun Fu , Runpeng Cui , Fei Sha , Changshui Zhang

Iconographic Image Captioning for Artworks

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the…

Computer Vision and Pattern Recognition · Computer Science 2021-02-09 Eva Cetinic

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within…

Computer Vision and Pattern Recognition · Computer Science 2019-08-02 Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler-Cinbis

Beyond Caption To Narrative: Video Captioning With Multiple Sentences

Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image…

Computer Vision and Pattern Recognition · Computer Science 2016-05-19 Andrew Shin , Katsunori Ohnishi , Tatsuya Harada

Compositional Generalization in Image Captioning

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a…

Machine Learning · Computer Science 2019-11-12 Mitja Nikolaus , Mostafa Abdou , Matthew Lamm , Rahul Aralikatte , Desmond Elliott

Pre-gen metrics: Predicting caption quality metrics without generating captions

Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the…

Neural and Evolutionary Computing · Computer Science 2019-02-05 Marc Tanti , Albert Gatt , Adrian Muscat

A Comprehensive Survey of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Md. Zakir Hossain , Ferdous Sohel , Mohd Fairuz Shiratuddin , Hamid Laga

Generating Images from Captions with Attention

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the…

Machine Learning · Computer Science 2016-03-01 Elman Mansimov , Emilio Parisotto , Jimmy Lei Ba , Ruslan Salakhutdinov