Related papers: Hyperbolic Audio-visual Zero-shot Learning

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Learning to classify video data from classes not included in the training data, i.e. video-based zero-shot learning, is challenging. We conjecture that the natural alignment between the audio and visual modalities in video data provides a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Otniel-Bogdan Mercea , Lukas Riesch , A. Sophia Koepke , Zeynep Akata

Hyper-Process Model: A Zero-Shot Learning algorithm for Regression Problems based on Shape Analysis

Zero-shot learning (ZSL) can be defined by correctly solving a task where no training data is available, based on previous acquired knowledge from different, but related tasks. So far, this area has mostly drawn the attention from computer…

Computer Vision and Pattern Recognition · Computer Science 2018-10-25 Joao Reis , Gil Gonçalves

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to…

Computer Vision and Pattern Recognition · Computer Science 2019-10-22 Kranti Kumar Parida , Neeraj Matiyali , Tanaya Guha , Gaurav Sharma

Audio-visual Generalized Zero-shot Learning the Easy Way

Audio-visual generalized zero-shot learning is a rapidly advancing domain that seeks to understand the intricate relations between audio and visual cues within videos. The overarching goal is to leverage insights from seen classes to…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Shentong Mo , Pedro Morgado

Generative Replay-based Continual Zero-Shot Learning

Zero-shot learning is a new paradigm to classify objects from classes that are not available at training time. Zero-shot learning (ZSL) methods have attracted considerable attention in recent years because of their ability to classify…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Chandan Gautam , Sethupathy Parameswaran , Ashish Mishra , Suresh Sundaram

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

In this paper, we propose a novel approach for generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that are not seen during training. We use the semantic relatedness of text…

Computer Vision and Pattern Recognition · Computer Science 2020-11-24 Pratik Mazumder , Pravendra Singh , Kranti Kumar Parida , Vinay P. Namboodiri

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute…

Sound · Computer Science 2024-07-22 Xuenan Xu , Pingyue Zhang , Ming Yan , Ji Zhang , Mengyue Wu

Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning

Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Yang Liu , Xun Zhang , Jiale Du , Xinbo Gao , Jungong Han

Temporal and cross-modal attention for audio-visual zero-shot learning

Audio-visual generalised zero-shot learning for video classification requires understanding the relations between the audio and visual information in order to be able to recognise samples from novel, previously unseen classes at test time.…

Computer Vision and Pattern Recognition · Computer Science 2022-07-21 Otniel-Bogdan Mercea , Thomas Hummel , A. Sophia Koepke , Zeynep Akata

Boosting Zero-shot Learning via Contrastive Optimization of Attribute Representations

Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for…

Computer Vision and Pattern Recognition · Computer Science 2023-07-19 Yu Du , Miaojing Shi , Fangyun Wei , Guoqi Li

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets. Recently most ZSL approaches focus on…

Computer Vision and Pattern Recognition · Computer Science 2018-07-25 Huajie Jiang , Ruiping Wang , Shiguang Shan , Xilin Chen

From Classical to Generalized Zero-Shot Learning: a Simple Adaptation Process

Zero-shot learning (ZSL) is concerned with the recognition of previously unseen classes. It relies on additional semantic knowledge for which a mapping can be learned with training examples of seen classes. While classical ZSL considers the…

Machine Learning · Computer Science 2019-01-16 Yannick Le Cacheux , Hervé Le Borgne , Michel Crucianu

On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning

Recent advances in audio-text cross-modal contrastive learning have shown its potential towards zero-shot learning. One possibility for this is by projecting item embeddings from pre-trained backbone neural networks into a cross-modal space…

Sound · Computer Science 2025-09-29 Tiago Tavares , Fabio Ayres , Zhepei Wang , Paris Smaragdis

Boosting Audio-visual Zero-shot Learning with Large Language Models

Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen…

Computer Vision and Pattern Recognition · Computer Science 2024-04-25 Haoxing Chen , Yaohui Li , Yan Hong , Zizheng Huang , Zhuoer Xu , Zhangxuan Gu , Jun Lan , Huijia Zhu , Weiqiang Wang

Dynamic VAEs with Generative Replay for Continual Zero-shot Learning

Continual zero-shot learning(CZSL) is a new domain to classify objects sequentially the model has not seen during training. It is more suitable than zero-shot and continual learning approaches in real-case scenarios when data may come…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Subhankar Ghosh

Generalized Zero-Shot Recognition based on Visually Semantic Embedding

We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map high-dimensional visual features to the semantic…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Pengkai Zhu , Hanxiao Wang , Venkatesh Saligrama

Generalised Zero-Shot Learning with a Classifier Ensemble over Multi-Modal Embedding Spaces

Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text…

Computer Vision and Pattern Recognition · Computer Science 2019-08-07 Rafael Felix , Ben Harwood , Michele Sasdelli , Gustavo Carneiro

Zero-Shot Audio Classification using Image Embeddings

Supervised learning methods can solve the given problem in the presence of a large set of labeled data. However, the acquisition of a dataset covering all the target classes typically requires manual labeling which is expensive and…

Sound · Computer Science 2022-06-13 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging…

Multimedia · Computer Science 2025-12-30 Luís Vilaça , Yi Yu , Paula Viana

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Yuhang Lu , Qi Jiang , Runnan Chen , Yuenan Hou , Xinge Zhu , Yuexin Ma