English
Related papers

Related papers: Using Deep Object Features for Image Descriptions

200 papers

Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding…

Machine Learning · Computer Science 2014-11-11 Ryan Kiros , Ruslan Salakhutdinov , Richard S. Zemel

Image captioning models typically follow an encoder-decoder architecture which uses abstract image feature vectors as input to the encoder. One of the most successful algorithms uses feature vectors extracted from the region proposals…

Computer Vision and Pattern Recognition · Computer Science 2020-01-14 Simao Herdade , Armin Kappeler , Kofi Boakye , Joao Soares

Image captioning is the process of automatically generating a description of an image in natural language. Image captioning is one of the significant challenges in image understanding since it requires not only recognizing salient objects…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ghadah Alabduljabbar , Hafida Benhidour , Said Kerrache

People can view the same image differently: they focus on different regions, objects, and details in varying orders and describe them in distinct linguistic styles. This leads to substantial variability in image descriptions. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Ruoyu Xue , Hieu Le , Jingyi Xu , Sounak Mondal , Abe Leite , Gregory Zelinsky , Minh Hoai , Dimitris Samaras

This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches, which often employ a per-object…

Computer Vision and Pattern Recognition · Computer Science 2023-04-06 Xuhui Jia , Yang Zhao , Kelvin C. K. Chan , Yandong Li , Han Zhang , Boqing Gong , Tingbo Hou , Huisheng Wang , Yu-Chuan Su

We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels. To ground this prediction process in…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Kaiyu Yue , Bor-Chun Chen , Jonas Geiping , Hengduo Li , Tom Goldstein , Ser-Nam Lim

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In…

Computer Vision and Pattern Recognition · Computer Science 2019-05-21 Yiyu Wang , Jungang Xu , Yingfei Sun , Ben He

Understanding 3D scenes goes beyond simply recognizing objects; it requires reasoning about the spatial and semantic relationships between them. Current 3D scene-language models often struggle with this relational understanding,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Jintang Xue , Ganning Zhao , Jie-En Yao , Hong-En Chen , Yue Hu , Meida Chen , Suya You , C. -C. Jay Kuo

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

Automatic transcription of scene understanding in images and videos is a step towards artificial general intelligence. Image captioning is a nomenclature for describing meaningful information in an image using computer vision techniques.…

Computer Vision and Pattern Recognition · Computer Science 2021-09-17 Shikha Dubey , Farrukh Olimov , Muhammad Aasim Rafique , Joonmo Kim , Moongu Jeon

Contemporary deep learning based video captioning follows encoder-decoder framework. In encoder, visual features are extracted with 2D/3D Convolutional Neural Networks (CNNs) and a transformed version of those features is passed to the…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Nayyer Aafaq , Naveed Akhtar , Wei Liu , Ajmal Mian

Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that…

Computer Vision and Pattern Recognition · Computer Science 2017-06-09 Zhongliang Yang , Yu-Jin Zhang , Sadaqat ur Rehman , Yongfeng Huang

Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number…

Computer Vision and Pattern Recognition · Computer Science 2017-07-24 Subhashini Venugopalan , Lisa Anne Hendricks , Marcus Rohrbach , Raymond Mooney , Trevor Darrell , Kate Saenko

Connecting multiple machine learning models into a pipeline is effective for handling complex problems. By breaking down the problem into steps, each tackled by a specific component model of the pipeline, the overall solution can be made…

Computer Vision and Pattern Recognition · Computer Science 2021-01-20 Tomoe Kishimoto , Masahiko Saito , Junichi Tanaka , Yutaro Iiyama , Ryu Sawada , Koji Terashi

The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is…

Computer Vision and Pattern Recognition · Computer Science 2018-05-02 Josiah Wang , Pranava Madhyastha , Lucia Specia

The paper presents a new model for single channel images low-level interpretation. The image is decomposed into a graph which captures a complete set of structural features. The description allows to accurately identify every edge location…

Computer Vision and Pattern Recognition · Computer Science 2019-04-23 Alessandro Dal Palu'

Image segmentation is the task of associating pixels in an image with their respective object class labels. It has a wide range of applications in many industries including healthcare, transportation, robotics, fashion, home improvement,…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Yuanbo Wang , Unaiza Ahsan , Hanyan Li , Matthew Hagen

Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images,…

Computer Vision and Pattern Recognition · Computer Science 2017-08-09 Linjie Yang , Kevin Tang , Jianchao Yang , Li-Jia Li

Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally…

Robotics · Computer Science 2024-02-07 Nicolas Harvey Chapman , Feras Dayoub , Will Browne , Chris Lehnert
‹ Prev 1 2 3 10 Next ›