Related papers: LEAN: Light and Efficient Audio Classification Net…
Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event…
In this paper, we show that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification. Even though there is a significant difference between audio Spectrogram and standard ImageNet image…
Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods…
In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…
In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution,…
Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…
This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside…
Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking. Recent machine learning methods, such as convolutional neural networks (CNNs),…
Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We…
This work addresses the need for enhanced accuracy and efficiency in speech command recognition systems, a critical component for improving user interaction in various smart applications. Leveraging the robust pretrained YAMNet model and…
In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently,…
Audio pattern recognition (APR) is an important research topic and can be applied to several fields related to our lives. Therefore, accurate and efficient APR systems need to be developed as they are useful in real applications. In this…
Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds…
Audio classification is the task of identifying the sound categories that are associated with a given audio signal. This paper presents an investigation on large-scale audio classification based on the recently released AudioSet database.…
After its sweeping success in vision and language tasks, pure attention-based neural architectures (e.g. DeiT) are emerging to the top of audio tagging (AT) leaderboards, which seemingly obsoletes traditional convolutional neural networks…
Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…
Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification…
We propose a method that quantifies the importance, namely relevance, of audio segments for classification in weakly-labelled problems. It works by drawing information from a set of class-wise one-vs-all classifiers. By selecting the…
Audio tagging is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing model performance, which mostly comes from the development of novel model architectures…
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with…