Related papers: Enhancing Zero-shot Audio Classification using Sou…

Zero-Shot Audio Classification Based on Class Label Embeddings

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear…

Machine Learning · Computer Science 2019-08-08 Huang Xie , Tuomas Virtanen

Zero-Shot Audio Classification via Semantic Embeddings

In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted from textual labels and sentence descriptions of sound classes. Our goal is to obtain a classifier that is capable of recognizing audio…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-12 Huang Xie , Tuomas Virtanen

Multi-label Zero-Shot Audio Classification with Temporal Attention

Zero-shot learning models are capable of classifying new classes by transferring knowledge from the seen classes using auxiliary information. While most of the existing zero-shot learning methods focused on single-label classification…

Sound · Computer Science 2024-09-04 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

Zero-Shot Audio Classification using Image Embeddings

Supervised learning methods can solve the given problem in the presence of a large set of labeled data. However, the acquisition of a dataset covering all the target classes typically requires manual labeling which is expensive and…

Sound · Computer Science 2022-06-13 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification

Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative…

Sound · Computer Science 2024-09-23 Michel Olvera , Paraskevas Stamatiadis , Slim Essid

Zero-shot Learning for Audio-based Music Classification and Tagging

Audio-based music classification and tagging is typically based on categorical supervised learning with a fixed set of labels. This intrinsically cannot handle unseen labels such as newly added music genres or semantic words that users…

Machine Learning · Computer Science 2020-03-20 Jeong Choi , Jongpil Lee , Jiyoung Park , Juhan Nam

Improving Audio Classification by Transitioning from Zero- to Few-Shot

State-of-the-art audio classification often employs a zero-shot approach, which involves comparing audio embeddings with embeddings from text describing the respective audio class. These embeddings are usually generated by neural networks…

Sound · Computer Science 2025-07-29 James Taylor , Wolfgang Mack

Zero-shot Sound Event Classification Using a Sound Attribute Vector with Global and Local Feature Learning

This paper introduces a zero-shot sound event classification (ZS-SEC) method to identify sound events that have never occurred in training data. In our previous work, we proposed a ZS-SEC method using sound attribute vectors (SAVs), where a…

Sound · Computer Science 2023-03-21 Yi-Han Lin , Xunquan Chen , Ryoichi Takashima , Tetsuya Takiguchi

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes. Zero-shot learning in audio classification refers to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-03 Huang Xie , Okko Räsänen , Tuomas Virtanen

The Benefits of Label-Description Training for Zero-Shot Text Classification

Pretrained language models have improved zero-shot text classification by allowing the transfer of semantic knowledge from the training data in order to classify among specific label sets in downstream tasks. We propose a simple way to…

Computation and Language · Computer Science 2023-10-24 Lingyu Gao , Debanjan Ghosh , Kevin Gimpel

Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

Audio-visual zero-shot learning methods commonly build on features extracted from pre-trained models, e.g. video or audio classification models. However, existing benchmarks predate the popularization of large multi-modal models, such as…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 David Kurzendörfer , Otniel-Bogdan Mercea , A. Sophia Koepke , Zeynep Akata

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

In this paper, we propose a novel approach for generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that are not seen during training. We use the semantic relatedness of text…

Computer Vision and Pattern Recognition · Computer Science 2020-11-24 Pratik Mazumder , Pravendra Singh , Kranti Kumar Parida , Vinay P. Namboodiri

Zero-Shot Text Classification with Self-Training

Recent advances in large pretrained language models have increased attention to zero-shot text classification. In particular, models finetuned on natural language inference datasets have been widely adopted as zero-shot classifiers due to…

Computation and Language · Computer Science 2022-11-01 Ariel Gera , Alon Halfon , Eyal Shnarch , Yotam Perlitz , Liat Ein-Dor , Noam Slonim

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a…

Sound · Computer Science 2022-02-15 Ke Chen , Xingjian Du , Bilei Zhu , Zejun Ma , Taylor Berg-Kirkpatrick , Shlomo Dubnov

ChatGPT-guided Semantics for Zero-shot Learning

Zero-shot learning (ZSL) aims to classify objects that are not observed or seen during training. It relies on class semantic description to transfer knowledge from the seen classes to the unseen classes. Existing methods of obtaining class…

Computer Vision and Pattern Recognition · Computer Science 2023-10-19 Fahimul Hoque Shubho , Townim Faisal Chowdhury , Ali Cheraghian , Morteza Saberi , Nabeel Mohammed , Shafin Rahman

Zero-shot audio captioning with audio-language model guidance and audio context keywords

Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-15 Leonard Salewski , Stefan Fauth , A. Sophia Koepke , Zeynep Akata

Zero-shot Learning and Knowledge Transfer in Music Classification and Tagging

Music classification and tagging is conducted through categorical supervised learning with a fixed set of labels. In principle, this cannot make predictions on unseen labels. Zero-shot learning is an approach to solve the problem by using…

Multimedia · Computer Science 2019-06-21 Jeong Choi , Jongpil Lee , Jiyoung Park , Juhan Nam

Exploring Meta Information for Audio-based Zero-shot Bird Classification

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This…

Sound · Computer Science 2024-06-12 Alexander Gebhard , Andreas Triantafyllopoulos , Teresa Bez , Lukas Christ , Alexander Kathan , Björn W. Schuller

Zero-Shot Federated Learning with New Classes for Audio Classification

Federated learning is an effective way of extracting insights from different user devices while preserving the privacy of users. However, new classes with completely unseen data distributions can stream across any device in a federated…

Machine Learning · Computer Science 2021-06-21 Gautham Krishna Gudur , Satheesh K. Perepu

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves…

Computer Vision and Pattern Recognition · Computer Science 2017-08-08 Berkan Demirel , Ramazan Gokberk Cinbis , Nazli Ikizler-Cinbis