Related papers: Learnable Frontends that do not Learn: Quantifying…

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends

Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-06 Qiquan Zhang , Buddhi Wickramasinghe , Eliathamby Ambikairajah , Vidhyasaharan Sethu , Haizhou Li

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

Learnable Acoustic Frontends in Bird Activity Detection

Autonomous recording units and passive acoustic monitoring present minimally intrusive methods of collecting bioacoustics data. Combining this data with species agnostic bird activity detection systems enables the monitoring of activity…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-04 Mark Anderson , Naomi Harte

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Learning Filter Banks Using Deep Learning For Acoustic Signals

Designing appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

Deep Feature Learning for Medical Acoustics

The purpose of this paper is to compare different learnable frontends in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by…

Sound · Computer Science 2026-01-21 Alessandro Maria Poirè , Federico Simonetta , Stavros Ntalampiras

Content Adaptive Front End For Audio Classification

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural…

Sound · Computer Science 2024-12-24 Prateek Verma , Chris Chafe

What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions

There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems. However, there is a dearth of analyses of what is actually learnt and the relative importance of training the different…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-11 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah

Acoustic Classification of Maritime Vessels using Learnable Filterbanks

Reliably monitoring and recognizing maritime vessels based on acoustic signatures is complicated by the variability of different recording scenarios. A robust classification framework must be able to generalize across diverse acoustic…

Sound · Computer Science 2025-06-02 Jonas Elsborg , Tejs Vegge , Arghya Bhowmik

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Recent years have witnessed a boom in self-supervised learning (SSL) in various areas including speech processing. Speech based SSL models present promising performance in a range of speech related tasks. However, the training of SSL models…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-21 Xie Chen , Ziyang Ma , Changli Tang , Yujin Wang , Zhisheng Zheng

Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

Current speech recognition architectures perform very well from the point of view of machine learning, hence user interaction. This suggests that they are emulating the human biological system well. We investigate whether the inference can…

Neurons and Cognition · Quantitative Biology 2022-08-26 Louise Coppieters de Gibson , Philip N. Garner

Biomimetic Frontend for Differentiable Audio Processing

While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…

Sound · Computer Science 2024-09-16 Ruolan Leslie Famularo , Dmitry N. Zotkin , Shihab A. Shamma , Ramani Duraiswami

Learning spectro-temporal representations of complex sounds with parameterized neural networks

Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact…

Sound · Computer Science 2021-08-04 Rachid Riad , Julien Karadayi , Anne-Catherine Bachoud-Lévi , Emmanuel Dupoux

Exploring spectro-temporal features in end-to-end convolutional neural networks

Triangular, overlapping Mel-scaled filters ("f-banks") are the current standard input for acoustic models that exploit their input's time-frequency geometry, because they provide a psycho-acoustically motivated time-frequency geometry for a…

Machine Learning · Computer Science 2019-01-03 Sean Robertson , Gerald Penn , Yingxue Wang

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-22 Susanta Sarangi , Md Sahidullah , Goutam Saha

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-27 Iván López-Espejo , Ram C. M. C. Shekar , Zheng-Hua Tan , Jesper Jensen , John H. L. Hansen

Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection

Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR)…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Ahmed Imtiaz Humayun , Shabnam Ghaffarzadegan , Zhe Feng , Taufiq Hasan