English
Related papers

Related papers: Simple Pooling Front-ends For Efficient Audio Clas…

200 papers

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for…

Sound · Computer Science 2022-05-02 Shunsuke Hidaka , Kohei Wakamiya , Tokihiko Kaburagi

Large Speech Language Models (LSLMs) typically operate at high token rates (tokens/s) to ensure acoustic fidelity, yet this results in sequence lengths that far exceed the underlying semantic content, incurring prohibitive inference costs.…

Computation and Language · Computer Science 2026-04-09 Bajian Xiang , Tingwei Guo , Xuan Chen , Yang Han

Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak…

Sound · Computer Science 2021-01-21 Yuzhuo Liu , Hangting Chen , YunWang , Pengyuan Zhang

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

Generative models are capable to address difficult problems with non-unique solutions like bandwidth extension and gap filling, removing highly non-linear artifacts from codecs, clipping and distortion, as opposed to removing linear…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-18 Sebastian Braun

Despite the advancements in cutting-edge technologies, audio signal processing continues to pose challenges and lacks the precision of a human speech processing system. To address these challenges, we propose a novel approach to simplify…

Sound · Computer Science 2026-03-26 Rinku Sebastian , Simon O'Keefe , Martin Trefzer

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic…

Machine Learning · Statistics 2014-11-27 Dawen Liang , Matthew D. Hoffman , Gautham J. Mysore

As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can…

Multimedia · Computer Science 2020-02-25 Liang Gao , Kele Xu , Huaimin Wang , Yuxing Peng

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-14 Efthymios Tzinis , Zhepei Wang , Paris Smaragdis

Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Oindrila Saha , Aditya Kusupati , Harsha Vardhan Simhadri , Manik Varma , Prateek Jain

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network…

Sound · Computer Science 2022-02-01 Efthymios Tzinis , Zhepei Wang , Xilin Jiang , Paris Smaragdis

Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified…

Sound · Computer Science 2021-11-09 Yu-Chen Lin , Cheng Yu , Yi-Te Hsu , Szu-Wei Fu , Yu Tsao , Tei-Wei Kuo

We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design…

Audio and Speech Processing · Electrical Eng. & Systems 2025-04-15 Gloria Dal Santo , Gian Marco De Bortoli , Karolina Prawda , Sebastian J. Schlecht , Vesa Välimäki

Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the…

Sound · Computer Science 2018-08-13 Brian McFee , Justin Salamon , Juan Pablo Bello

In recent years, semantic segmentation has flourished in various applications. However, the high computational cost remains a significant challenge that hinders its further adoption. The filter pruning method for structured network slimming…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Dongyue Wu , Zilin Guo , Li Yu , Nong Sang , Changxin Gao

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and…

Sound · Computer Science 2023-05-23 Shwetank Choudhary , CR Karthik , Punuru Sri Lakshmi , Sumit Kumar

This paper explores the impact of dimensionality reduction and pooling methods for Environmental Sound Classification (ESC) using lightweight CNNs. We evaluate Sparse Salient Region Pooling (SSRP) and its variants, SSRP-Basic (SSRP-B) and…

Signal Processing · Electrical Eng. & Systems 2025-11-14 Parinaz Binandeh Dehaghani , Danilo Pena , A. Pedro Aguiar

This paper focuses on channel pruning for semantic segmentation networks. Previous methods to compress and accelerate deep neural networks in the classification task cannot be straightforwardly applied to the semantic segmentation network…

Computer Vision and Pattern Recognition · Computer Science 2022-08-30 Xinghao Chen , Yiman Zhang , Yunhe Wang
‹ Prev 1 2 3 10 Next ›