Related papers: Simple Pooling Front-ends For Efficient Audio Clas…

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental…

Sound · Computer Science 2021-01-22 Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

An Investigation of the Effectiveness of Phase for Audio Classification

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for…

Sound · Computer Science 2022-05-02 Shunsuke Hidaka , Kohei Wakamiya , Tokihiko Kaburagi

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Large Speech Language Models (LSLMs) typically operate at high token rates (tokens/s) to ensure acoustic fidelity, yet this results in sequence lengths that far exceed the underlying semantic content, incurring prohibitive inference costs.…

Computation and Language · Computer Science 2026-04-09 Bajian Xiang , Tingwei Guo , Xuan Chen , Yang Han

Power pooling: An adaptive pooling function for weakly labelled sound event detection

Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak…

Sound · Computer Science 2021-01-21 Yuzhuo Liu , Hangting Chen , YunWang , Pengyuan Zhang

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends",…

Sound · Computer Science 2022-03-30 Sarthak Yadav , Neil Zeghidour

Real-time Speech Restoration using Data Prediction Mean Flows

Generative models are capable to address difficult problems with non-unique solutions like bandwidth extension and gap filling, removing highly non-linear artifacts from codecs, clipping and distortion, as opposed to removing linear…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-18 Sebastian Braun

Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing

Despite the advancements in cutting-edge technologies, audio signal processing continues to pose challenges and lacks the precision of a human speech processing system. To address these challenges, we propose a novel approach to simplify…

Sound · Computer Science 2026-03-26 Rinku Sebastian , Simon O'Keefe , Martin Trefzer

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy…

Sound · Computer Science 2022-07-13 Jan Schlüter , Gerald Gutenbrunner

A Generative Product-of-Filters Model of Audio

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic…

Machine Learning · Statistics 2014-11-27 Dawen Liang , Matthew D. Hoffman , Gautham J. Mysore

Multi-Representation Knowledge Distillation For Audio Classification

As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can…

Multimedia · Computer Science 2020-02-25 Liang Gao , Kele Xu , Huaimin Wang , Yuxing Peng

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-14 Efthymios Tzinis , Zhepei Wang , Paris Smaragdis

RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference

Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Oindrila Saha , Aditya Kusupati , Harsha Vardhan Simhadri , Manik Varma , Prateek Jain

Compute and memory efficient universal sound source separation

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network…

Sound · Computer Science 2022-02-01 Efthymios Tzinis , Zhepei Wang , Xilin Jiang , Paris Smaragdis

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified…

Sound · Computer Science 2021-11-09 Yu-Chen Lin , Cheng Yu , Yi-Te Hsu , Szu-Wei Fu , Yu Tsao , Tei-Wei Kuo

FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design…

Audio and Speech Processing · Electrical Eng. & Systems 2025-04-15 Gloria Dal Santo , Gian Marco De Bortoli , Karolina Prawda , Sebastian J. Schlecht , Vesa Välimäki

Adaptive pooling operators for weakly labeled sound event detection

Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the…

Sound · Computer Science 2018-08-13 Brian McFee , Justin Salamon , Juan Pablo Bello

Structural Pruning via Spatial-aware Information Redundancy for Semantic Segmentation

In recent years, semantic segmentation has flourished in various applications. However, the high computational cost remains a significant challenge that hinders its further adoption. The filter pruning method for structured network slimming…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Dongyue Wu , Zilin Guo , Li Yu , Nong Sang , Changxin Gao

LEAN: Light and Efficient Audio Classification Network

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and…

Sound · Computer Science 2023-05-23 Shwetank Choudhary , CR Karthik , Punuru Sri Lakshmi , Sumit Kumar

Investigation of Feature Selection and Pooling Methods for Environmental Sound Classification

This paper explores the impact of dimensionality reduction and pooling methods for Environmental Sound Classification (ESC) using lightweight CNNs. We evaluate Sparse Salient Region Pooling (SSRP) and its variants, SSRP-Basic (SSRP-B) and…

Signal Processing · Electrical Eng. & Systems 2025-11-14 Parinaz Binandeh Dehaghani , Danilo Pena , A. Pedro Aguiar

MTP: Multi-Task Pruning for Efficient Semantic Segmentation Networks

This paper focuses on channel pruning for semantic segmentation networks. Previous methods to compress and accelerate deep neural networks in the classification task cannot be straightforwardly applied to the semantic segmentation network…

Computer Vision and Pattern Recognition · Computer Science 2022-08-30 Xinghao Chen , Yiman Zhang , Yunhe Wang