Related papers: Multi-label Open-set Audio Classification

Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?

Polyphonic events are the main error source of audio event detection (AED) systems. In deep-learning context, the most common approach to deal with event overlaps is to treat the AED task as a multi-label classification problem. By doing…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-01 Huy Phan , Thi Ngoc Tho Nguyen , Philipp Koch , Alfred Mertins

Multi-label Zero-Shot Audio Classification with Temporal Attention

Zero-shot learning models are capable of classifying new classes by transferring knowledge from the seen classes using auxiliary information. While most of the existing zero-shot learning methods focused on single-label classification…

Sound · Computer Science 2024-09-04 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

Topic Model Based Multi-Label Classification from the Crowd

Multi-label classification is a common supervised machine learning problem where each instance is associated with multiple classes. The key challenge in this problem is learning the correlations between the classes. An additional challenge…

Machine Learning · Computer Science 2016-04-05 Divya Padmanabhan , Satyanath Bhat , Shirish Shevade , Y. Narahari

Multi-level Attention Model for Weakly Supervised Audio Classification

In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-03-08 Changsong Yu , Karim Said Barsim , Qiuqiang Kong , Bin Yang

SpeechMLC: Speech Multi-label Classification

In this paper, we propose a multi-label classification framework to detect multiple speaking styles in a speech sample. Unlike previous studies that have primarily focused on identifying a single target style, our framework effectively…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Miseul Kim , Seyun Um , Hyeonjin Cha , Hong-goo Kang

In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects,…

Machine Learning · Computer Science 2024-12-24 Ismail Hakki Karaman , Gulser Koksal , Levent Eriskin , Salih Salihoglu

Zero-shot Learning for Audio-based Music Classification and Tagging

Audio-based music classification and tagging is typically based on categorical supervised learning with a fixed set of labels. This intrinsically cannot handle unseen labels such as newly added music genres or semantic words that users…

Machine Learning · Computer Science 2020-03-20 Jeong Choi , Jongpil Lee , Jiyoung Park , Juhan Nam

Audio Event Detection using Weakly Labeled Data

Acoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data.…

Sound · Computer Science 2016-07-07 Anurag Kumar , Bhiksha Raj

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one…

Sound · Computer Science 2020-07-28 Eduardo Fonseca , Shawn Hershey , Manoj Plakal , Daniel P. W. Ellis , Aren Jansen , R. Channing Moore , Xavier Serra

Multi-label audio classification with a noisy zero-shot teacher

We propose a novel training scheme using self-label correction and data augmentation methods designed to deal with noisy labels and improve real-world accuracy on a polyphonic audio content detection task. The augmentation method reduces…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Sebastian Braun , Hannes Gamper

Unifying Isolated and Overlapping Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks

We propose a multi-label multi-task framework based on a convolutional recurrent neural network to unify detection of isolated and overlapping audio events. The framework leverages the power of convolutional recurrent neural network…

Machine Learning · Computer Science 2019-02-20 Huy Phan , Oliver Y. Chén , Philipp Koch , Lam Pham , Ian McLoughlin , Alfred Mertins , Maarten De Vos

Inducing Generalized Multi-Label Rules with Learning Classifier Systems

In recent years, multi-label classification has attracted a significant body of research, motivated by real-life applications, such as text classification and medical diagnoses. Although sparsely studied in this context, Learning Classifier…

Neural and Evolutionary Computing · Computer Science 2015-12-29 Fani A. Tzima , Miltiadis Allamanis , Alexandros Filotheou , Pericles A. Mitkas

Acoustic Scene Classification using Audio Tagging

Acoustic scene classification systems using deep neural networks classify given recordings into pre-defined classes. In this study, we propose a novel scheme for acoustic scene classification which adopts an audio tagging system inspired by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-21 Jee-weon Jung , Hye-jin Shim , Ju-ho Kim , Seung-bin Kim , Ha-Jin Yu

Evaluating Multi-label Classifiers with Noisy Labels

Multi-label classification (MLC) is a generalization of standard classification where multiple labels may be assigned to a given sample. In the real world, it is more common to deal with noisy datasets than clean datasets, given how modern…

Machine Learning · Computer Science 2021-02-18 Wenting Zhao , Carla Gomes

Large Language Models for Dysfluency Detection in Stuttered Speech

Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the…

Sound · Computer Science 2024-06-18 Dominik Wagner , Sebastian P. Bayerl , Ilja Baumann , Korbinian Riedhammer , Elmar Nöth , Tobias Bocklet

Learning to Separate Object Sounds by Watching Unlabeled Video

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Ruohan Gao , Rogerio Feris , Kristen Grauman

Towards joint sound scene and polyphonic sound event recognition

Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-02 Helen L. Bear , Ines Nolasco , Emmanouil Benetos

SoundNet: Learning Sound Representations from Unlabeled Video

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using…

Computer Vision and Pattern Recognition · Computer Science 2016-10-31 Yusuf Aytar , Carl Vondrick , Antonio Torralba

SALT: Standardized Audio event Label Taxonomy

Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints:…

Sound · Computer Science 2024-09-19 Paraskevas Stamatiadis , Michel Olvera , Slim Essid

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural…

Sound · Computer Science 2024-12-31 Sangwook Park , David K. Han , Mounya Elhilali