Related papers: Multi-level Attention Model for Weakly Supervised …

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We…

Sound · Computer Science 2019-12-11 Qiuqiang Kong , Changsong Yu , Turab Iqbal , Yong Xu , Wenwu Wang , Mark D. Plumbley

Audio Set classification with attention model: A probabilistic perspective

This paper investigates the classification of the Audio Set dataset. Audio Set is a large scale weakly labelled dataset of sound clips. Previous work used multiple instance learning (MIL) to classify weakly labelled data. In MIL, a bag…

Sound · Computer Science 2019-12-10 Qiuqiang Kong , Yong Xu , Wenwu Wang , Mark D. Plumbley

Self-supervised Attention Model for Weakly Labeled Audio Event Classification

We describe a novel weakly labeled Audio Event Classification approach based on a self-supervised attention model. The weakly labeled framework is used to eliminate the need for expensive data labeling procedure and self-supervised…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-09 Bongjun Kim , Shabnam Ghaffarzadegan

Audio Event Detection using Weakly Labeled Data

Acoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data.…

Sound · Computer Science 2016-07-07 Anurag Kumar , Bhiksha Raj

A Global-local Attention Framework for Weakly Labelled Audio Tagging

Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, where the onset and offset times of the sound events are not provided. Previous works have used the multiple instance learning (MIL) framework,…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-04 Helin Wang , Yuexian Zou , Wenwu Wang

Weakly Supervised Scalable Audio Content Analysis

Audio Event Detection is an important task for content analysis of multimedia data. Most of the current works on detection of audio events is driven through supervised learning approaches. We propose a weakly supervised learning framework…

Sound · Computer Science 2016-06-14 Anurag Kumar , Bhiksha Raj

Multi-label Zero-Shot Audio Classification with Temporal Attention

Zero-shot learning models are capable of classifying new classes by transferring knowledge from the seen classes using auxiliary information. While most of the existing zero-shot learning methods focused on single-label classification…

Sound · Computer Science 2024-09-04 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to…

Sound · Computer Science 2017-03-20 Yong Xu , Qiuqiang Kong , Qiang Huang , Wenwu Wang , Mark D. Plumbley

Data-Efficient Weakly Supervised Learning for Low-Resource Audio Event Detection Using Deep Learning

We propose a method to perform audio event detection under the common constraint that only limited training data are available. In training a deep learning system to perform audio event detection, two practical problems arise. Firstly, most…

Sound · Computer Science 2018-10-29 Veronica Morfi , Dan Stowell

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-03 Haytham M. Fayek , Anurag Kumar

Learning Multi-instrument Classification with Partial Labels

Multi-instrument recognition is the task of predicting the presence or absence of different instruments within an audio clip. A considerable challenge in applying deep learning to multi-instrument recognition is the scarcity of labeled…

Sound · Computer Science 2020-01-27 Amir Kenarsari Anhari

Multi-attention Networks for Temporal Localization of Video-level Labels

Temporal localization remains an important challenge in video understanding. In this work, we present our solution to the 3rd YouTube-8M Video Understanding Challenge organized by Google Research. Participants were required to build a…

Computer Vision and Pattern Recognition · Computer Science 2019-11-19 Lijun Zhang , Srinath Nizampatnam , Ahana Gangopadhyay , Marcos V. Conde

A Closer Look at Weak Label Learning for Audio Events

Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale…

Sound · Computer Science 2018-04-26 Ankit Shah , Anurag Kumar , Alexander G. Hauptmann , Bhiksha Raj

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data. Strongly labeled data can be simply understood as fully…

Machine Learning · Computer Science 2017-02-21 Anurag Kumar , Bhiksha Raj

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation. We modify this attention-based feedforward structure in such a way that allows the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-06 Wim Boes , Hugo Van hamme

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem. This paper proposes a Multi-Task Learning (MTL)…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-02 Soham Deshmukh , Bhiksha Raj , Rita Singh

Multi-label Open-set Audio Classification

Current audio classification models have small class vocabularies relative to the large number of sound event classes of interest in the real world. Thus, they provide a limited view of the world that may miss important yet unexpected or…

Sound · Computer Science 2023-10-24 Sripathi Sridhar , Mark Cartwright

Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

Sound event detection is a challenging task, especially for scenes with multiple simultaneous events. While event classification methods tend to be fairly accurate, event localization presents additional challenges, especially when large…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-12 Sandeep Kothinti , Keisuke Imoto , Debmalya Chakrabarty , Gregory Sell , Shinji Watanabe , Mounya Elhilali

Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification

We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-13 Juliano Henrique Foleiss , Tiago Fernandes Tavares

Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling

Detecting medical conditions from speech acoustics is fundamentally a weakly-supervised learning problem: a single, often noisy, session-level label must be linked to nuanced patterns within a long, complex audio recording. This task is…

Sound · Computer Science 2026-04-21 Xingyuan Li , Mengyue Wu