Related papers: Learning Temporal Resolution in Spectrogram for Au…

Audio Classification from Time-Frequency Texture

Time-frequency representations of audio signals often resemble texture images. This paper derives a simple audio classification algorithm based on treating sound spectrograms as texture images. The algorithm is inspired by an earlier visual…

Computer Vision and Pattern Recognition · Computer Science 2008-09-29 Guoshen Yu , Jean-Jacques Slotine

Audio Time-Scale Modification with Temporal Compressing Networks

We propose a novel approach for time-scale modification of audio signals. Unlike traditional methods that rely on the framing technique or the short-time Fourier transform to preserve the frequency during temporal stretching, our neural…

Sound · Computer Science 2023-10-09 Ernie Chu , Ju-Ting Chen , Chia-Ping Chen

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Time-Frequency Audio Features for Speech-Music Classification

Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-06 Mrinmoy Bhattacharjee , S. R. M. Prasanna , Prithwijit Guha

Histogram of gradients of Time-Frequency Representations for Audio scene detection

This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an…

Sound · Computer Science 2015-08-21 Alain Rakotomamonjy , Gilles Gasso

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

Transformers have become central to recent advances in audio classification. However, training an audio spectrogram transformer, e.g. AST, from scratch can be resource and time-intensive. Furthermore, the complexity of transformers heavily…

Sound · Computer Science 2024-01-17 Jiu Feng , Mehmet Hamza Erol , Joon Son Chung , Arda Senocak

AudioSR: Versatile Audio Super-resolution at Scale

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types…

Sound · Computer Science 2023-09-15 Haohe Liu , Ke Chen , Qiao Tian , Wenwu Wang , Mark D. Plumbley

Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection

Many state-of-the-art systems for audio tagging and sound event detection employ convolutional recurrent neural architectures. Typically, they are trained in a mean teacher setting to deal with the heterogeneous annotation of the available…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-28 Wim Boes , Hugo Van hamme

SpecTNT: a Time-Frequency Transformer for Music Audio

Transformers have drawn attention in the MIR field for their remarkable performance shown in natural language processing and computer vision. However, prior works in the audio processing domain mostly use Transformer as a temporal feature…

Sound · Computer Science 2021-10-26 Wei-Tsung Lu , Ju-Chiang Wang , Minz Won , Keunwoo Choi , Xuchen Song

Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices

Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses. In this work, a relatively straightforward…

Sound · Computer Science 2021-05-26 Michał Kośmider

An Investigation of the Effectiveness of Phase for Audio Classification

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for…

Sound · Computer Science 2022-05-02 Shunsuke Hidaka , Kohei Wakamiya , Tokihiko Kaburagi

Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

Audio source separation is often achieved by estimating the magnitude spectrogram of each source, and then applying a phase recovery (or spectrogram inversion) algorithm to retrieve time-domain signals. Typically, spectrogram inversion is…

Sound · Computer Science 2023-07-03 Paul Magron , Tuomas Virtanen

Pattern Recognition in Vital Signs Using Spectrograms

Spectrograms visualize the frequency components of a given signal which may be an audio signal or even a time-series signal. Audio signals have higher sampling rate and high variability of frequency with time. Spectrograms can capture such…

Signal Processing · Electrical Eng. & Systems 2021-09-06 Sidharth Srivatsav Sribhashyam , Md Sirajus Salekin , Dmitry Goldgof , Ghada Zamzmi , Mark Last , Yu Sun

Self-supervised audio representation learning for mobile devices

We explore self-supervised models that can be potentially deployed on mobile devices to learn general purpose audio representations. Specifically, we propose methods that exploit the temporal context in the spectrogram domain. One method…

Audio and Speech Processing · Electrical Eng. & Systems 2019-05-29 Marco Tagliasacchi , Beat Gfeller , Félix de Chaumont Quitry , Dominik Roblek

An efficient supervised dictionary learning method for audio signal recognition

Machine hearing or listening represents an emerging area. Conventional approaches rely on the design of handcrafted features specialized to a specific audio task and that can hardly generalized to other audio fields. For example,…

Computer Vision and Pattern Recognition · Computer Science 2018-12-13 Imad Rida , Romain Hérault , Gilles Gasso

Joint Time-Frequency Scattering for Audio Classification

We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to…

Sound · Computer Science 2018-08-06 Joakim Andén , Vincent Lostanlen , Stéphane Mallat

Speech Denoising by Accumulating Per-Frequency Modeling Fluctuations

We present a method for audio denoising that combines processing done in both the time domain and the time-frequency domain. Given a noisy audio clip, the method trains a deep neural network to fit this signal. Since the fitting is only…

Sound · Computer Science 2020-06-11 Michael Michelashvili , Lior Wolf

Full-Frequency Temporal Patching and Structured Masking for Enhanced Audio Classification

Transformers and State-Space Models (SSMs) have advanced audio classification by modeling spectrograms as sequences of patches. However, existing models such as the Audio Spectrogram Transformer (AST) and Audio Mamba (AuM) adopt square…

Sound · Computer Science 2025-09-01 Aditya Makineni , Baocheng Geng , Qing Tian

Audio Difference Learning for Audio Captioning

This study introduces a novel training paradigm, audio difference learning, for improving audio captioning. The fundamental concept of the proposed learning method is to create a feature representation space that preserves the relationship…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-18 Tatsuya Komatsu , Yusuke Fujita , Kazuya Takeda , Tomoki Toda

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

Speech separation has been very successful with deep learning techniques. Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for…

Sound · Computer Science 2019-04-17 Gene-Ping Yang , Chao-I Tuan , Hung-Yi Lee , Lin-shan Lee