Related papers: LEAN: Light and Efficient Audio Classification Net…

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event…

Sound · Computer Science 2020-08-25 Qiuqiang Kong , Yin Cao , Turab Iqbal , Yuxuan Wang , Wenwu Wang , Mark D. Plumbley

Rethinking CNN Models for Audio Classification

In this paper, we show that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification. Even though there is a significant difference between audio Spectrogram and standard ImageNet image…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Kamalesh Palanisamy , Dipika Singhania , Angela Yao

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods…

Sound · Computer Science 2021-02-22 Qiuqiang Kong , Haohe Liu , Xingjian Du , Li Chen , Rui Xia , Yuxuan Wang

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

Adapting a ConvNeXt model to audio classification on AudioSet

In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution,…

Sound · Computer Science 2023-06-02 Thomas Pellegrini , Ismail Khalfaoui-Hassani , Etienne Labbé , Timothée Masquelier

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside…

Sound · Computer Science 2024-04-09 Hamza Mahdi , Eptehal Nashnoush , Rami Saab , Arjun Balachandar , Rishit Dagli , Lucas X. Perri , Houman Khosravani

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking. Recent machine learning methods, such as convolutional neural networks (CNNs),…

Sound · Computer Science 2023-05-31 Arshdeep Singh , Haohe Liu , Mark D. Plumbley

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We…

Sound · Computer Science 2019-12-11 Qiuqiang Kong , Changsong Yu , Turab Iqbal , Yong Xu , Wenwu Wang , Mark D. Plumbley

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning

This work addresses the need for enhanced accuracy and efficiency in speech command recognition systems, a critical component for improving user interaction in various smart applications. Leveraging the robust pretrained YAMNet model and…

Sound · Computer Science 2025-04-29 Sidahmed Lachenani , Hamza Kheddar , Mohamed Ouldzmirli

Multi-level Attention Model for Weakly Supervised Audio Classification

In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-03-08 Changsong Yu , Karim Said Barsim , Qiuqiang Kong , Bin Yang

ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition (APR) is an important research topic and can be applied to several fields related to our lives. Therefore, accurate and efficient APR systems need to be developed as they are useful in real applications. In this…

Sound · Computer Science 2022-07-21 Sergey Verbitskiy , Vladimir Berikov , Viacheslav Vyshegorodtsev

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-03 Haytham M. Fayek , Anurag Kumar

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

Audio classification is the task of identifying the sound categories that are associated with a given audio signal. This paper presents an investigation on large-scale audio classification based on the recently released AudioSet database.…

Sound · Computer Science 2018-10-31 Yuzhong Wu , Tan Lee

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

After its sweeping success in vision and language tasks, pure attention-based neural architectures (e.g. DeiT) are emerging to the top of audio tagging (AT) leaderboards, which seemingly obsoletes traditional convolutional neural networks…

Sound · Computer Science 2022-08-25 Juncheng B Li , Shuhui Qu , Po-Yao Huang , Florian Metze

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition

Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification…

Sound · Computer Science 2024-03-28 Sayanton V. Dibbo , Juston S. Moore , Garrett T. Kenyon , Michael A. Teti

Segment Relevance Estimation for Audio Analysis and Weakly-Labelled Classification

We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-13 Juliano Henrique Foleiss , Tiago Fernandes Tavares

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

Audio tagging is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing model performance, which mostly comes from the development of novel model architectures…

Sound · Computer Science 2021-11-18 Yuan Gong , Yu-An Chung , James Glass

CNN Architectures for Large-Scale Audio Classification

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with…

Sound · Computer Science 2017-01-11 Shawn Hershey , Sourish Chaudhuri , Daniel P. W. Ellis , Jort F. Gemmeke , Aren Jansen , R. Channing Moore , Manoj Plakal , Devin Platt , Rif A. Saurous , Bryan Seybold , Malcolm Slaney , Ron J. Weiss , Kevin Wilson