Related papers: Deep image representations using caption generator…

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Aditya Bhattacharya , Eshwar Shamanna Girishekar , Padmakar Anil Deshpande

DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets

Most of the approaches for discovering visual attributes in images demand significant supervision, which is cumbersome to obtain. In this paper, we aim to discover visual attributes in a weakly supervised setting that is commonly…

Computer Vision and Pattern Recognition · Computer Science 2015-04-21 Sukrit Shankar , Vikas K. Garg , Roberto Cipolla

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such…

Computer Vision and Pattern Recognition · Computer Science 2018-07-09 Yash Patel , Lluis Gomez , Raul Gomez , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

Dense Captioning with Joint Inference and Visual Context

Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images,…

Computer Vision and Pattern Recognition · Computer Science 2017-08-09 Linjie Yang , Kevin Tang , Jianchao Yang , Li-Jia Li

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

Scene labeling is a challenging classification problem where each input image requires a pixel-level prediction map. Recently, deep-learning-based methods have shown their effectiveness on solving this problem. However, we argue that the…

Computer Vision and Pattern Recognition · Computer Science 2017-06-12 Zhe Wang , Hongsheng Li , Wanli Ouyang , Xiaogang Wang

Enhancing Image Captioning with Neural Models

This research explores the realm of neural image captioning using deep learning models. The study investigates the performance of different neural architecture configurations, focusing on the inject architecture, and proposes a novel…

Computer Vision and Pattern Recognition · Computer Science 2023-12-04 Pooja Bhatnagar , Sai Mrunaal , Sachin Kamnure

From Show to Tell: A Survey on Deep Learning-based Image Captioning

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, large research efforts have been devoted to image captioning, i.e. describing images with syntactically and semantically meaningful…

Computer Vision and Pattern Recognition · Computer Science 2021-12-02 Matteo Stefanini , Marcella Cornia , Lorenzo Baraldi , Silvia Cascianelli , Giuseppe Fiameni , Rita Cucchiara

Why Are Deep Representations Good Perceptual Quality Features?

Recently, intermediate feature maps of pre-trained convolutional neural networks have shown significant perceptual quality improvements, when they are used in the loss function for training new networks. It is believed that these features…

Computer Vision and Pattern Recognition · Computer Science 2020-07-24 Taimoor Tariq , Okan Tarhan Tursun , Munchurl Kim , Piotr Didyk

ContCap: A scalable framework for continual image captioning

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Giang Nguyen , Tae Joon Jun , Trung Tran , Tolcha Yalew , Daeyoung Kim

Controlled Caption Generation for Images Through Adversarial Attacks

Deep learning is found to be vulnerable to adversarial examples. However, its adversarial susceptibility in image caption generation is under-explored. We study adversarial examples for vision and language models, which typically adopt an…

Computer Vision and Pattern Recognition · Computer Science 2021-07-08 Nayyer Aafaq , Naveed Akhtar , Wei Liu , Mubarak Shah , Ajmal Mian

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

TIPS: Text-Image Pretraining with Spatial awareness

While image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Kevis-Kokitsi Maninis , Kaifeng Chen , Soham Ghosh , Arjun Karpur , Koert Chen , Ye Xia , Bingyi Cao , Daniel Salz , Guangxing Han , Jan Dlabal , Dan Gnanapragasam , Mojtaba Seyedhosseini , Howard Zhou , Andre Araujo

Deep Deconvolutional Networks for Scene Parsing

Scene parsing is an important and challenging prob- lem in computer vision. It requires labeling each pixel in an image with the category it belongs to. Tradition- ally, it has been approached with hand-engineered features from color…

Machine Learning · Statistics 2014-11-18 Rahul Mohan

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions…

Computer Vision and Pattern Recognition · Computer Science 2015-11-25 Justin Johnson , Andrej Karpathy , Li Fei-Fei

Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural…

Computer Vision and Pattern Recognition · Computer Science 2017-03-08 Michele Volpi , Devis Tuia

Deep Learning Approaches on Image Captioning: A Review

Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Taraneh Ghandi , Hamidreza Pourreza , Hamidreza Mahyar

Supervised and Contrastive Self-Supervised In-Domain Representation Learning for Dense Prediction Problems in Remote Sensing

In recent years Convolutional neural networks (CNN) have made significant progress in computer vision. These advancements have been applied to other areas, such as remote sensing and have shown satisfactory results. However, the lack of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-01 Ali Ghanbarzade , Hossein Soleimani

A Comprehensive Survey of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Md. Zakir Hossain , Ferdous Sohel , Mohd Fairuz Shiratuddin , Hamid Laga

Unsupervised Image Captioning

Deep neural networks have achieved great successes on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire. In this paper, we make the first…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Yang Feng , Lin Ma , Wei Liu , Jiebo Luo

Dense Video Captioning Using Unsupervised Semantic Information

We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Valter Estevam , Rayson Laroca , Helio Pedrini , David Menotti