English
Related papers

Related papers: Stack-Captioning: Coarse-to-Fine Learning for Imag…

200 papers

Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, i.e., top-down and bottom-up, the former transfers the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-06 Wei Wei , Ling Cheng , Xianling Mao , Guangyou Zhou , Feida Zhu

It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Bang Yang , Yuexian Zou , Fenglin Liu , Can Zhang

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Jie Wu , Tianshui Chen , Hefeng Wu , Zhi Yang , Guangchun Luo , Liang Lin

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The…

Computation and Language · Computer Science 2020-08-07 Arushi Goel , Basura Fernando , Thanh-Son Nguyen , Hakan Bilen

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided…

Computation and Language · Computer Science 2021-09-09 Yekun Chai , Shuo Jin , Junliang Xing

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing. Such image captioning methods are…

Computer Vision and Pattern Recognition · Computer Science 2017-11-29 Li Zhang , Flood Sung , Feng Liu , Tao Xiang , Shaogang Gong , Yongxin Yang , Timothy M. Hospedales

Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem -- it requires multiple concepts…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Hammad A. Ayyubi , Rahul Lokesh , Alireza Zareian , Bo Wu , Shih-Fu Chang

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

Automatic annotation of images with descriptive words is a challenging problem with vast applications in the areas of image search and retrieval. This problem can be viewed as a label-assignment problem by a classifier dealing with a very…

Computer Vision and Pattern Recognition · Computer Science 2017-05-09 Amara Tariq , Hassan Foroosh

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there exist some…

Computer Vision and Pattern Recognition · Computer Science 2021-02-15 Haoran Chen , Jianmin Li , Xiaolin Hu

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a…

Computer Vision and Pattern Recognition · Computer Science 2018-11-29 Peter Anderson , Stephen Gould , Mark Johnson

To parse images into fine-grained semantic parts, the complex fine-grained elements will put it in trouble when using off-the-shelf semantic segmentation networks. In this paper, for image parsing task, we propose to parse images from…

Computer Vision and Pattern Recognition · Computer Science 2018-04-24 Jiagao Hu , Zhengxing Sun , Yunhan Sun , Jinlong Shi

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li
‹ Prev 1 2 3 10 Next ›