English
Related papers

Related papers: Learning Visual N-Grams from Web Data

200 papers

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition…

Computer Vision and Pattern Recognition · Computer Science 2024-02-19 Jingyi Zhang , Jiaxing Huang , Sheng Jin , Shijian Lu

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any…

Computer Vision and Pattern Recognition · Computer Science 2021-03-02 Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , Gretchen Krueger , Ilya Sutskever

In a real-world setting, visual recognition systems can be brought to make predictions for images belonging to previously unknown class labels. In order to make semantically meaningful predictions for such inputs, we propose a two-step…

Machine Learning · Computer Science 2017-08-29 Vincent P. A. Lonij , Ambrish Rawat , Maria-Irina Nicolae

We propose a visually grounded speech model that acquires new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word.…

Computation and Language · Computer Science 2023-05-31 Leanne Nortje , Benjamin van Niekerk , Herman Kamper

Convolutional networks trained on large supervised dataset produce visual features which form the basis for the state-of-the-art in many computer-vision problems. Further improvements of these visual features will likely require even larger…

Computer Vision and Pattern Recognition · Computer Science 2015-11-10 Armand Joulin , Laurens van der Maaten , Allan Jabri , Nicolas Vasilache

We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM) build vector-based word representations by learning to predict linguistic contexts in…

Computation and Language · Computer Science 2015-03-13 Angeliki Lazaridou , Nghia The Pham , Marco Baroni

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually encoded…

Computer Vision and Pattern Recognition · Computer Science 2016-05-19 Scott Reed , Zeynep Akata , Bernt Schiele , Honglak Lee

Large scale vision and language models can achieve impressive zero-shot recognition performance by mapping class specific text queries to image content. Two distinct challenges that remain however, are high sensitivity to the choice of…

Computer Vision and Pattern Recognition · Computer Science 2023-04-05 Sarah Parisot , Yongxin Yang , Steven McDonagh

Recent advances in zero-shot image recognition suggest that vision-language models learn generic visual representations with a high degree of semantic information that may be arbitrarily probed with natural language phrases. Understanding…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Kanchana Ranasinghe , Brandon McKinzie , Sachin Ravi , Yinfei Yang , Alexander Toshev , Jonathon Shlens

In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently…

Computer Vision and Pattern Recognition · Computer Science 2015-10-05 Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang , Alan Yuille

The aim of this paper is to present a new method for visual place recognition. Our system combines global image characterization and visual words, which allows to use efficient Bayesian filtering methods to integrate several images. More…

Machine Learning · Statistics 2014-03-24 Mathieu Dubois , Frenoux Emmanuelle , Philippe Tarroux

Deep neural networks have become the default choice for many applications like image and video recognition, segmentation and other image and video related tasks.However, a critical challenge with these models is the lack of…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Sunil Kumar Vengalil , Neelam Sinha

Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data. Extensive research has been devoted to the first two, but much less…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Yazhou Yao , Jian Zhang , Xiansheng Hua , Fumin Shen , Zhenmin Tang

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to…

Computer Vision and Pattern Recognition · Computer Science 2016-02-22 Hao Fang , Saurabh Gupta , Forrest Iandola , Rupesh Srivastava , Li Deng , Piotr Dollár , Jianfeng Gao , Xiaodong He , Margaret Mitchell , John C. Platt , C. Lawrence Zitnick , Geoffrey Zweig

Learning to fuse vision and language information and representing them is an important research problem with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in…

Computer Vision and Pattern Recognition · Computer Science 2020-10-08 Bowen Zhang , Hexiang Hu , Vihan Jain , Eugene Ie , Fei Sha

What does learning to model relationships between strings teach large language models (LLMs) about the visual world? We systematically evaluate LLMs' abilities to generate and recognize an assortment of visual concepts of increasing…

Computer Vision and Pattern Recognition · Computer Science 2024-01-04 Pratyusha Sharma , Tamar Rott Shaham , Manel Baradad , Stephanie Fu , Adrian Rodriguez-Munoz , Shivam Duggal , Phillip Isola , Antonio Torralba

We present an empirical analysis of the state-of-the-art systems for referring expression recognition -- the task of identifying the object in an image referred to by a natural language expression -- with the goal of gaining insight into…

Computation and Language · Computer Science 2018-05-31 Volkan Cirik , Louis-Philippe Morency , Taylor Berg-Kirkpatrick

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful…

Computation and Language · Computer Science 2017-05-29 Herman Kamper , Shane Settle , Gregory Shakhnarovich , Karen Livescu

Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on…

Computation and Language · Computer Science 2019-09-25 Danny Merkx , Stefan L. Frank , Mirjam Ernestus

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images. Recent work has shown that learning from textual descriptions, such as Wikipedia articles, avoids the problem of…

Machine Learning · Computer Science 2015-09-28 Jimmy Ba , Kevin Swersky , Sanja Fidler , Ruslan Salakhutdinov
‹ Prev 1 2 3 10 Next ›