Related papers: Hierarchical Memory Decoding for Video Captioning

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent…

Computer Vision and Pattern Recognition · Computer Science 2018-11-26 Lorenzo Baraldi , Costantino Grana , Rita Cucchiara

End-to-End Video Captioning

Building correspondences across different modalities, such as video and language, has recently become critical in many visual recognition applications, such as video captioning. Inspired by machine translation, recent models tackle this…

Computer Vision and Pattern Recognition · Computer Science 2019-11-11 Silvio Olivastri , Gurkirt Singh , Fabio Cuzzolin

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for…

Computer Vision and Pattern Recognition · Computer Science 2015-11-12 Pingbo Pan , Zhongwen Xu , Yi Yang , Fei Wu , Yueting Zhuang

Memory-Attended Recurrent Network for Video Captioning

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context…

Computer Vision and Pattern Recognition · Computer Science 2019-05-13 Wenjie Pei , Jiyuan Zhang , Xiangrong Wang , Lei Ke , Xiaoyong Shen , Yu-Wing Tai

Reconstruction Network for Video Captioning

In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2018-04-02 Bairui Wang , Lin Ma , Wei Zhang , Wei Liu

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Wenhao Jiang , Lin Ma , Yu-Gang Jiang , Wei Liu , Tong Zhang

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Qingzhong Wang , Antoni B. Chan

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2019-06-05 Wei Zhang , Bairui Wang , Lin Ma , Wei Liu

Reflective Decoding Network for Image Captioning

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary…

Computer Vision and Pattern Recognition · Computer Science 2019-09-02 Lei Ke , Wenjie Pei , Ruiyu Li , Xiaoyong Shen , Yu-Wing Tai

Multimodal Memory Modelling for Video Captioning

Video captioning which automatically translates video clips into natural language sentences is a very important task in computer vision. By virtue of recent deep learning technologies, e.g., convolutional neural networks (CNNs) and…

Computer Vision and Pattern Recognition · Computer Science 2016-11-18 Junbo Wang , Wei Wang , Yan Huang , Liang Wang , Tieniu Tan

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence…

Computer Vision and Pattern Recognition · Computer Science 2019-05-06 Jingwen Chen , Yingwei Pan , Yehao Li , Ting Yao , Hongyang Chao , Tao Mei

Watch It Twice: Video Captioning with a Refocused Video Encoder

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in…

Computer Vision and Pattern Recognition · Computer Science 2019-07-31 Xiangxi Shi , Jianfei Cai , Shafiq Joty , Jiuxiang Gu

Beyond Short Snippets: Deep Networks for Video Classification

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep…

Computer Vision and Pattern Recognition · Computer Science 2015-04-14 Joe Yue-Hei Ng , Matthew Hausknecht , Sudheendra Vijayanarasimhan , Oriol Vinyals , Rajat Monga , George Toderici

A sequential guiding network with attention for image captioning

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description…

Computer Vision and Pattern Recognition · Computer Science 2019-02-12 Daouda Sow , Zengchang Qin , Mouhamed Niasse , Tao Wan

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

The explosion of video data on the internet requires effective and efficient technology to generate captions automatically for people who are not able to watch the videos. Despite the great progress of video captioning research,…

Computer Vision and Pattern Recognition · Computer Science 2018-07-11 Xiangxi Shi , Jianfei Cai , Jiuxiang Gu , Shafiq Joty

A Critical Review of Recurrent Neural Networks for Sequence Learning

Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video…

Machine Learning · Computer Science 2015-10-20 Zachary C. Lipton , John Berkowitz , Charles Elkan

MNeRV: A Multilayer Neural Representation for Videos

As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Qingling Chang , Haohui Yu , Shuxuan Fu , Zhiqiang Zeng , Chuangquan Chen

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Analysis of Convolutional Decoder for Image Caption Generation

Recently Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. However, unlike Recurrent Neural Networks, the performance of Convolutional Neural Networks as Decoders for Image…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Sulabh Katiyar , Samir Kumar Borgohain

Better Understanding Hierarchical Visual Relationship for Image Caption

The Convolutional Neural Network (CNN) has been the dominant image feature extractor in computer vision for years. However, it fails to get the relationship between images/objects and their hierarchical interactions which can be helpful for…

Computer Vision and Pattern Recognition · Computer Science 2019-12-05 Zheng-cong Fei