Related papers: Write a Classifier: Predicting Visual Classifiers …

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories. Through our optimization…

Computer Vision and Pattern Recognition · Computer Science 2015-06-30 Mohamed Elhoseiny , Ahmed Elgammal , Babak Saleh

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. Along with the growth of computational capacity, we now have open-source vision-language pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Wenhao Wu , Zhun Sun , Wanli Ouyang

Visually grounded learning of keyword prediction from untranscribed speech

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful…

Computation and Language · Computer Science 2017-05-29 Herman Kamper , Shane Settle , Gregory Shakhnarovich , Karen Livescu

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images. Recent work has shown that learning from textual descriptions, such as Wikipedia articles, avoids the problem of…

Machine Learning · Computer Science 2015-09-28 Jimmy Ba , Kevin Swersky , Sanja Fidler , Ruslan Salakhutdinov

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and…

Machine Learning · Computer Science 2019-04-09 Hyeonwoo Noh , Taehoon Kim , Jonghwan Mun , Bohyung Han

Kernel Coding: General Formulation and Special Cases

Representing images by compact codes has proven beneficial for many visual recognition tasks. Most existing techniques, however, perform this coding step directly in image feature space, where the distributions of the different classes are…

Computer Vision and Pattern Recognition · Computer Science 2014-09-02 Mehrtash Harandi , Mathieu Salzmann

Text Classification with Few Examples using Controlled Generalization

Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but…

Computation and Language · Computer Science 2020-05-19 Abhijit Mahabal , Jason Baldridge , Burcu Karagol Ayan , Vincent Perot , Dan Roth

A Multi-class Approach -- Building a Visual Classifier based on Textual Descriptions using Zero-Shot Learning

Machine Learning (ML) techniques for image classification routinely require many labelled images for training the model and while testing, we ought to use images belonging to the same domain as those used for training. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Preeti Jagdish Sajjan , Frank G. Glavin

Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning

Leveraging class semantic descriptions and examples of known objects, zero-shot learning makes it possible to train a recognition model for an object class whose examples are not available. In this paper, we propose a novel zero-shot…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Soravit Changpinyo , Wei-Lun Chao , Fei Sha

Generating Visual Explanations

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text;…

Computer Vision and Pattern Recognition · Computer Science 2016-03-29 Lisa Anne Hendricks , Zeynep Akata , Marcus Rohrbach , Jeff Donahue , Bernt Schiele , Trevor Darrell

Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning

In complex visual recognition tasks it is typical to adopt multiple descriptors, that describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms can be fused into a…

Computer Vision and Pattern Recognition · Computer Science 2015-06-15 Jayaraman J. Thiagarajan , Karthikeyan Natesan Ramamurthy , Andreas Spanias

Seeing in Words: Learning to Classify through Language Bottlenecks

Neural networks for computer vision extract uninterpretable features despite achieving high accuracy on benchmarks. In contrast, humans can explain their predictions using succinct and intuitive descriptions. To incorporate explainability…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Khalid Saifullah , Yuxin Wen , Jonas Geiping , Micah Goldblum , Tom Goldstein

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings…

Computer Vision and Pattern Recognition · Computer Science 2020-01-15 Bowen Zhang , Hexiang Hu , Fei Sha

Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress…

Computer Vision and Pattern Recognition · Computer Science 2017-09-06 Mohamed Elhoseiny , Yizhe Zhu , Han Zhang , Ahmed Elgammal

TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models

Interpreting the learned features of vision models has posed a longstanding challenge in the field of machine learning. To address this issue, we propose a novel method that leverages the capabilities of language models to interpret the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Saeid Asgari Taghanaki , Aliasghar Khani , Ali Saheb Pasand , Amir Khasahmadi , Aditya Sanghi , Karl D. D. Willis , Ali Mahdavi-Amiri

Learning to Read by Spelling: Towards Unsupervised Text Recognition

This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with…

Computer Vision and Pattern Recognition · Computer Science 2018-12-11 Ankush Gupta , Andrea Vedaldi , Andrew Zisserman

Convolutional Kernel Networks

An important goal in visual recognition is to devise image representations that are invariant to particular transformations. In this paper, we address this goal with a new type of convolutional neural network (CNN) whose invariance is…

Computer Vision and Pattern Recognition · Computer Science 2015-01-08 Julien Mairal , Piotr Koniusz , Zaid Harchaoui , Cordelia Schmid

An efficient framework for learning sentence representations

In this work we propose a simple and efficient framework for learning sentence representations from unlabelled data. Drawing inspiration from the distributional hypothesis and recent work on learning sentence representations, we reformulate…

Computation and Language · Computer Science 2018-03-09 Lajanugen Logeswaran , Honglak Lee

Learning Functional Distributional Semantics with Visual Data

Functional Distributional Semantics is a recently proposed framework for learning distributional semantics that provides linguistic interpretability. It models the meaning of a word as a binary classifier rather than a numerical vector. In…

Computation and Language · Computer Science 2022-04-25 Yinhong Liu , Guy Emerson

Learning Visual N-Grams from Web Data

Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a…

Computer Vision and Pattern Recognition · Computer Science 2017-08-08 Ang Li , Allan Jabri , Armand Joulin , Laurens van der Maaten