Related papers: Improving Image Captioning with Conditional Genera…

Towards Diverse and Natural Image Descriptions via a Conditional GAN

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This…

Computer Vision and Pattern Recognition · Computer Science 2017-08-14 Bo Dai , Sanja Fidler , Raquel Urtasun , Dahua Lin

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal…

Computer Vision and Pattern Recognition · Computer Science 2019-03-12 Dianqi Li , Qiuyuan Huang , Xiaodong He , Lei Zhang , Ming-Ting Sun

Controlled Caption Generation for Images Through Adversarial Attacks

Deep learning is found to be vulnerable to adversarial examples. However, its adversarial susceptibility in image caption generation is under-explored. We study adversarial examples for vision and language models, which typically adopt an…

Computer Vision and Pattern Recognition · Computer Science 2021-07-08 Nayyer Aafaq , Naveed Akhtar , Wei Liu , Mubarak Shah , Ajmal Mian

Can adversarial training learn image captioning ?

Recently, generative adversarial networks (GAN) have gathered a lot of interest. Their efficiency in generating unseen samples of high quality, especially images, has improved over the years. In the field of Natural Language Generation…

Computation and Language · Computer Science 2019-11-01 Jean-Benoit Delbrouck , Bastien Vanderplaetse , Stéphane Dupont

Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the…

Computer Vision and Pattern Recognition · Computer Science 2019-01-14 Shiyang Yan , Yuan Xie , Fangyu Wu , Jeremy S. Smith , Wenjin Lu , Bailing Zhang

Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Bowen Li , Xiaojuan Qi , Philip H. S. Torr , Thomas Lukasiewicz

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularity for image generation tasks. However, such models are associated with complex learning mechanisms and demand very large relevant datasets. This work borrows concepts…

Machine Learning · Computer Science 2018-09-28 Shagan Sah , Chi Zhang , Thang Nguyen , Dheeraj Kumar Peri , Ameya Shringi , Raymond Ptucha

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Diverse Audio Captioning via Adversarial Training

Audio captioning aims at generating natural language descriptions for audio clips automatically. Existing audio captioning models have shown promising improvement in recent years. However, these models are mostly trained via maximum…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-30 Xinhao Mei , Xubo Liu , Jianyuan Sun , Mark D. Plumbley , Wenwu Wang

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by…

Computer Vision and Pattern Recognition · Computer Science 2019-05-13 Yan Xu , Baoyuan Wu , Fumin Shen , Yanbo Fan , Yong Zhang , Heng Tao Shen , Wei Liu

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be…

Computer Vision and Pattern Recognition · Computer Science 2016-12-07 Arnab Ghosh , Viveka Kulharia , Amitabha Mukerjee , Vinay Namboodiri , Mohit Bansal

Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Generative adversarial networks (GANs) have demonstrated to be successful at generating realistic real-world images. In this paper we compare various GAN techniques, both supervised and unsupervised. The effects on training stability of…

Machine Learning · Computer Science 2018-03-28 Mathijs Pieters , Marco Wiering

Guiding GANs: How to control non-conditional pre-trained GANs for conditional image generation

Generative Adversarial Networks (GANs) are an arrange of two neural networks -- the generator and the discriminator -- that are jointly trained to generate artificial data, such as images, from random inputs. The quality of these generated…

Computer Vision and Pattern Recognition · Computer Science 2021-01-05 Manel Mateos , Alejandro González , Xavier Sevillano

Context-Aware Semantic Inpainting

Recently image inpainting has witnessed rapid progress due to generative adversarial networks (GAN) that are able to synthesize realistic contents. However, most existing GAN-based methods for semantic inpainting apply an auto-encoder…

Computer Vision and Pattern Recognition · Computer Science 2017-12-22 Haofeng Li , Guanbin Li , Liang Lin , Yizhou Yu

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Towards Generating Diverse Audio Captions via Adversarial Training

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences. This task has attracted increasing attention and substantial progress has been made in recent years.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-02 Xinhao Mei , Xubo Liu , Jianyuan Sun , Mark D. Plumbley , Wenwu Wang

Generative Adversarial Networks and Other Generative Models

Generative networks are fundamentally different in their aim and methods compared to CNNs for classification, segmentation, or object detection. They have initially not been meant to be an image analysis tool, but to produce naturally…

Computer Vision and Pattern Recognition · Computer Science 2022-07-11 Markus Wenzel

Adversarial Inference for Multi-Sentence Video Description

While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among…

Computer Vision and Pattern Recognition · Computer Science 2019-04-17 Jae Sung Park , Marcus Rohrbach , Trevor Darrell , Anna Rohrbach

Reflective Decoding Network for Image Captioning

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary…

Computer Vision and Pattern Recognition · Computer Science 2019-09-02 Lei Ke , Wenjie Pei , Ruiyu Li , Xiaoyong Shen , Yu-Wing Tai