Related papers: Learnable Frontends that do not Learn: Quantifying…
Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…
Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…
Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the…
In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…
Autonomous recording units and passive acoustic monitoring present minimally intrusive methods of collecting bioacoustics data. Combining this data with species agnostic bird activity detection systems enables the monitoring of activity…
Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…
Designing appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features…
The purpose of this paper is to compare different learnable frontends in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by…
We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural…
There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems. However, there is a dearth of analyses of what is actually learnt and the relative importance of training the different…
Reliably monitoring and recognizing maritime vessels based on acoustic signatures is complicated by the variability of different recording scenarios. A robust classification framework must be able to generalize across diverse acoustic…
Recent years have witnessed a boom in self-supervised learning (SSL) in various areas including speech processing. Speech based SSL models present promising performance in a range of speech related tasks. However, the training of SSL models…
Current speech recognition architectures perform very well from the point of view of machine learning, hence user interaction. This suggests that they are emulating the human biological system well. We investigate whether the inference can…
While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…
Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact…
Triangular, overlapping Mel-scaled filters ("f-banks") are the current standard input for acoustic models that exploit their input's time-frequency geometry, because they provide a psycho-acoustically motivated time-frequency geometry for a…
Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data.…
In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…
In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech…
Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR)…