English
Related papers

Related papers: Learnable Frontends that do not Learn: Quantifying…

200 papers

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-06 Qiquan Zhang , Buddhi Wickramasinghe , Eliathamby Ambikairajah , Vidhyasaharan Sethu , Haizhou Li

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

Autonomous recording units and passive acoustic monitoring present minimally intrusive methods of collecting bioacoustics data. Combining this data with species agnostic bird activity detection systems enables the monitoring of activity…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-04 Mark Anderson , Naomi Harte

Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-16 Jingyu Li , Yusheng Tian , Tan Lee

Designing appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

The purpose of this paper is to compare different learnable frontends in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by…

Sound · Computer Science 2026-01-21 Alessandro Maria Poirè , Federico Simonetta , Stavros Ntalampiras

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural…

Sound · Computer Science 2024-12-24 Prateek Verma , Chris Chafe

There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems. However, there is a dearth of analyses of what is actually learnt and the relative importance of training the different…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-11 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah

Reliably monitoring and recognizing maritime vessels based on acoustic signatures is complicated by the variability of different recording scenarios. A robust classification framework must be able to generalize across diverse acoustic…

Sound · Computer Science 2025-06-02 Jonas Elsborg , Tejs Vegge , Arghya Bhowmik

Recent years have witnessed a boom in self-supervised learning (SSL) in various areas including speech processing. Speech based SSL models present promising performance in a range of speech related tasks. However, the training of SSL models…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-21 Xie Chen , Ziyang Ma , Changli Tang , Yujin Wang , Zhisheng Zheng

Current speech recognition architectures perform very well from the point of view of machine learning, hence user interaction. This suggests that they are emulating the human biological system well. We investigate whether the inference can…

Neurons and Cognition · Quantitative Biology 2022-08-26 Louise Coppieters de Gibson , Philip N. Garner

While models in audio and speech processing are becoming deeper and more end-to-end, they as a consequence need expensive training on large data, and are often brittle. We build on a classical model of human hearing and make it…

Sound · Computer Science 2024-09-16 Ruolan Leslie Famularo , Dmitry N. Zotkin , Shihab A. Shamma , Ramani Duraiswami

Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact…

Sound · Computer Science 2021-08-04 Rachid Riad , Julien Karadayi , Anne-Catherine Bachoud-Lévi , Emmanuel Dupoux

Triangular, overlapping Mel-scaled filters ("f-banks") are the current standard input for acoustic models that exploit their input's time-frequency geometry, because they provide a psycho-acoustically motivated time-frequency geometry for a…

Machine Learning · Computer Science 2019-01-03 Sean Robertson , Gerald Penn , Yingxue Wang

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-22 Susanta Sarangi , Md Sahidullah , Goutam Saha

In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-29 Hanyu Meng , Vidhyasaharan Sethu , Eliathamby Ambikairajah , Qiquan Zhang , Haizhou Li

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-27 Iván López-Espejo , Ram C. M. C. Shekar , Zheng-Hua Tan , Jesper Jensen , John H. L. Hansen

Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR)…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Ahmed Imtiaz Humayun , Shabnam Ghaffarzadegan , Zhe Feng , Taufiq Hasan
‹ Prev 1 2 3 10 Next ›