Related papers: Stack-Captioning: Coarse-to-Fine Learning for Imag…

Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation

Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, i.e., top-down and bottom-up, the former transfers the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-06 Wei Wei , Ling Cheng , Xianling Mao , Guangyou Zhou , Feida Zhu

Non-Autoregressive Coarse-to-Fine Video Captioning

It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Bang Yang , Yuexian Zou , Fenglin Liu , Can Zhang

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Fine-Grained Image Captioning with Global-Local Discriminative Objective

Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Jie Wu , Tianshui Chen , Hefeng Wu , Zhi Yang , Guangchun Luo , Liang Lin

Injecting Prior Knowledge into Image Caption Generation

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The…

Computation and Language · Computer Science 2020-08-07 Arushi Goel , Basura Fernando , Thanh-Son Nguyen , Hakan Bilen

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

RefineCap: Concept-Aware Refinement for Image Captioning

Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided…

Computation and Language · Computer Science 2021-09-09 Yekun Chai , Shuo Jin , Junliang Xing

Deep Learning Approaches on Image Captioning: A Review

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Actor-Critic Sequence Training for Image Captioning

Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing. Such image captioning methods are…

Computer Vision and Pattern Recognition · Computer Science 2017-11-29 Li Zhang , Flood Sung , Feng Liu , Tao Xiang , Shaogang Gong , Yongxin Yang , Timothy M. Hospedales

Learning from Children: Improving Image-Caption Pretraining via Curriculum

Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem -- it requires multiple concepts…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Hammad A. Ayyubi , Rahul Lokesh , Alireza Zareian , Bo Wu , Shih-Fu Chang

Image Captioning based on Feature Refinement and Reflective Decoding

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

ContCap: A scalable framework for continual image captioning

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

Image Annotation using Multi-Layer Sparse Coding

Automatic annotation of images with descriptive words is a challenging problem with vast applications in the areas of image search and retrieval. This problem can be viewed as a label-assignment problem by a classifier dealing with a very…

Computer Vision and Pattern Recognition · Computer Science 2017-05-09 Amara Tariq , Hassan Foroosh

Delving Deeper into the Decoder for Video Captioning

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there exist some…

Computer Vision and Pattern Recognition · Computer Science 2021-02-15 Haoran Chen , Jianmin Li , Xiaolin Hu

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

Progressive refinement: a method of coarse-to-fine image parsing using stacked network

To parse images into fine-grained semantic parts, the complex fine-grained elements will put it in trouble when using off-the-shelf semantic segmentation networks. In this paper, for image parsing task, we propose to parse images from…

Computer Vision and Pattern Recognition · Computer Science 2018-04-24 Jiagao Hu , Zhengxing Sun , Yunhan Sun , Jinlong Shi

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li