Related papers: LEAF: A Learnable Frontend for Audio Classificatio…

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends

Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-06 Qiquan Zhang , Buddhi Wickramasinghe , Eliathamby Ambikairajah , Vidhyasaharan Sethu , Haizhou Li

Deep Feature Learning for Medical Acoustics

The purpose of this paper is to compare different learnable frontends in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by…

Sound · Computer Science 2026-01-21 Alessandro Maria Poirè , Federico Simonetta , Stavros Ntalampiras

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

While much of modern speech and audio processing relies on deep neural networks trained using fixed audio representations, recent studies suggest great potential in acoustic frontends learnt jointly with a backend. In this study, we focus…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-21 Mark Anderson , Tomi Kinnunen , Naomi Harte

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

Content Adaptive Front End For Audio Classification

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural…

Sound · Computer Science 2024-12-24 Prateek Verma , Chris Chafe

What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions

There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems. However, there is a dearth of analyses of what is actually learnt and the relative importance of training the different…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-11 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah

Simple Pooling Front-ends For Efficient Audio Classification

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios. Most existing approaches are designed to reduce the size of audio neural networks using methods such as model pruning. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-09 Xubo Liu , Haohe Liu , Qiuqiang Kong , Xinhao Mei , Mark D. Plumbley , Wenwu Wang

Learning Filter Banks Using Deep Learning For Acoustic Signals

Designing appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw…

Computation and Language · Computer Science 2018-06-22 Neil Zeghidour , Nicolas Usunier , Gabriel Synnaeve , Ronan Collobert , Emmanuel Dupoux

Biomimetic Frontend for Differentiable Audio Processing

While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…

Sound · Computer Science 2024-09-16 Ruolan Leslie Famularo , Dmitry N. Zotkin , Shihab A. Shamma , Ramani Duraiswami

Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music

Modern audio systems universally employ mel-scale representations derived from 1940s Western psychoacoustic studies, potentially encoding cultural biases that create systematic performance disparities. We present a comprehensive evaluation…

Sound · Computer Science 2026-04-14 Shivam Chauhan , Ajay Pundhir

LEAN: Light and Efficient Audio Classification Network

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and…

Sound · Computer Science 2023-05-23 Shwetank Choudhary , CR Karthik , Punuru Sri Lakshmi , Sumit Kumar

Learnable Acoustic Frontends in Bird Activity Detection

Autonomous recording units and passive acoustic monitoring present minimally intrusive methods of collecting bioacoustics data. Combining this data with species agnostic bird activity detection systems enables the monitoring of activity…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-04 Mark Anderson , Naomi Harte

Adaptive Representations of Sound for Automatic Insect Recognition

Insect population numbers and biodiversity have been rapidly declining with time, and monitoring these trends has become increasingly important for conservation measures to be effectively implemented. But monitoring methods are often…

Sound · Computer Science 2024-02-01 Marius Faiß , Dan Stowell

An Investigation of the Effectiveness of Phase for Audio Classification

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for…

Sound · Computer Science 2022-05-02 Shunsuke Hidaka , Kohei Wakamiya , Tokihiko Kaburagi

PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification

Deep learning has been applied to diverse audio semantics tasks, enabling the construction of models that learn hierarchical levels of features from high-dimensional raw data, delivering state-of-the-art performance. But do these algorithms…

Sound · Computer Science 2021-07-21 Lazaros Vrysis , Iordanis Thoidis , Charalampos Dimoulas , George Papanikolaou

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which…

Machine Learning · Computer Science 2018-09-20 Monika Doerfler , Thomas Grill , Roswitha Bammer , Arthur Flexer