Related papers: Learning neural audio features without supervision

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

Deep Feature Learning for Medical Acoustics

The purpose of this paper is to compare different learnable frontends in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by…

Sound · Computer Science 2026-01-21 Alessandro Maria Poirè , Federico Simonetta , Stavros Ntalampiras

Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

While much of modern speech and audio processing relies on deep neural networks trained using fixed audio representations, recent studies suggest great potential in acoustic frontends learnt jointly with a backend. In this study, we focus…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-21 Mark Anderson , Tomi Kinnunen , Naomi Harte

Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends

Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-06 Qiquan Zhang , Buddhi Wickramasinghe , Eliathamby Ambikairajah , Vidhyasaharan Sethu , Haizhou Li

Content Adaptive Front End For Audio Classification

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural…

Sound · Computer Science 2024-12-24 Prateek Verma , Chris Chafe

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to…

Sound · Computer Science 2014-07-14 Dan Stowell , Mark D. Plumbley

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale…

Sound · Computer Science 2021-05-21 Sascha Hornauer , Ke Li , Stella X. Yu , Shabnam Ghaffarzadegan , Liu Ren

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

Sound · Computer Science 2022-10-11 Matthew C. McCallum , Filip Korzeniowski , Sergio Oramas , Fabien Gouyon , Andreas F. Ehmann

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which…

Machine Learning · Computer Science 2018-09-20 Monika Doerfler , Thomas Grill , Roswitha Bammer , Arthur Flexer

Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study

Deep neural networks have recently achieved breakthroughs in sound generation. Despite the outstanding sample quality, current sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limiting…

Sound · Computer Science 2024-07-30 Yi Yuan , Haohe Liu , Jinhua Liang , Xubo Liu , Mark D. Plumbley , Wenwu Wang

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification

Deep learning has been applied to diverse audio semantics tasks, enabling the construction of models that learn hierarchical levels of features from high-dimensional raw data, delivering state-of-the-art performance. But do these algorithms…

Sound · Computer Science 2021-07-21 Lazaros Vrysis , Iordanis Thoidis , Charalampos Dimoulas , George Papanikolaou

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to…

Sound · Computer Science 2021-05-05 Andrew N Carr , Quentin Berthet , Mathieu Blondel , Olivier Teboul , Neil Zeghidour

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio,…

Sound · Computer Science 2020-05-05 Sanna Wager , Aparna Khare , Minhua Wu , Kenichi Kumatani , Shiva Sundaram

SELF: Learning to Filter Noisy Labels with Self-Ensembling

Deep neural networks (DNNs) have been shown to over-fit a dataset when being trained with noisy labels for a long enough time. To overcome this problem, we present a simple and effective method self-ensemble label filtering (SELF) to…

Computer Vision and Pattern Recognition · Computer Science 2019-10-07 Duc Tam Nguyen , Chaithanya Kumar Mummadi , Thi Phuong Nhung Ngo , Thi Hoai Phuong Nguyen , Laura Beggel , Thomas Brox

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal…

Sound · Computer Science 2020-11-05 Soo-Whan Chung , Hong Goo Kang , Joon Son Chung

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Many current deep learning approaches make extensive use of backbone networks pre-trained on large datasets like ImageNet, which are then fine-tuned to perform a certain task. In remote sensing, the lack of comparable large annotated…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Konrad Heidler , Lichao Mou , Di Hu , Pu Jin , Guangyao Li , Chuang Gan , Ji-Rong Wen , Xiao Xiang Zhu

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Sound · Computer Science 2022-01-10 Sangeeta Srivastava , Yun Wang , Andros Tjandra , Anurag Kumar , Chunxi Liu , Kritika Singh , Yatharth Saraf