Related papers: Hyperbolic Audio-visual Zero-shot Learning
Learning to classify video data from classes not included in the training data, i.e. video-based zero-shot learning, is challenging. We conjecture that the natural alignment between the audio and visual modalities in video data provides a…
Zero-shot learning (ZSL) can be defined by correctly solving a task where no training data is available, based on previous acquired knowledge from different, but related tasks. So far, this area has mostly drawn the attention from computer…
We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to…
Audio-visual generalized zero-shot learning is a rapidly advancing domain that seeks to understand the intricate relations between audio and visual cues within videos. The overarching goal is to leverage insights from seen classes to…
Zero-shot learning is a new paradigm to classify objects from classes that are not available at training time. Zero-shot learning (ZSL) methods have attracted considerable attention in recent years because of their ability to classify…
In this paper, we propose a novel approach for generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that are not seen during training. We use the semantic relatedness of text…
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute…
Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot…
Audio-visual generalised zero-shot learning for video classification requires understanding the relations between the audio and visual information in order to be able to recognise samples from novel, previously unseen classes at test time.…
Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for…
Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets. Recently most ZSL approaches focus on…
Zero-shot learning (ZSL) is concerned with the recognition of previously unseen classes. It relies on additional semantic knowledge for which a mapping can be learned with training examples of seen classes. While classical ZSL considers the…
Recent advances in audio-text cross-modal contrastive learning have shown its potential towards zero-shot learning. One possibility for this is by projecting item embeddings from pre-trained backbone neural networks into a cross-modal space…
Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen…
Continual zero-shot learning(CZSL) is a new domain to classify objects sequentially the model has not seen during training. It is more suitable than zero-shot and continual learning approaches in real-case scenarios when data may come…
We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map high-dimensional visual features to the semantic…
Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text…
Supervised learning methods can solve the given problem in the presence of a large set of labeled data. However, the acquisition of a dataset covering all the target classes typically requires manual labeling which is expensive and…
Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging…
Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels…