Related papers: A Practical Guide to Spectrogram Analysis for Audi…
Spectrograms visualize the frequency components of a given signal which may be an audio signal or even a time-series signal. Audio signals have higher sampling rate and high variability of frequency with time. Spectrograms can capture such…
Audio fingerprinting is a technique used to identify and match audio recordings based on their unique characteristics. It involves creating a condensed representation of an audio signal that can be used to quickly compare and match against…
Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was…
The spectrogram is a classical DSP tool used to view signals in both time and frequency. Unfortunately, the Heisenberg Uncertainty Principal limits our ability to use them for detecting and measuring narrowband signal modulation in wideband…
Recent advancements in deep learning have significantly impacted the field of speech signal processing, particularly in the analysis and manipulation of complex spectrograms. This survey provides a comprehensive overview of the…
A finite-energy signal is represented by a square-integrable, complex-valued function $t\mapsto s(t)$ of a real variable $t$, interpreted as time. Similarly, a noisy signal is represented by a random process. Time-frequency analysis, a…
One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than…
Adsorption processes play a fundamental role in molecular transport through nanofluidic systems, but their signatures in measured signals are often hard to distinguish from other processes like diffusion. In this paper, we derive an…
While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for…
A number of signal processing and statistical methods can be used in analyzing either pieces of text or DNA sequences. These techniques can be used in a number of ways, such as determining authorship of documents, finding genes in DNA, and…
This paper studies a spectrum estimation method for the case that the samples are obtained at a rate lower than the Nyquist rate. The method is referred to as the correlogram for undersampled data. The algorithm partitions the spectrum into…
Spectroscopy has played the key role in revealing, and thereby understanding, the structure of atoms and molecules. A central drive in this field is the pursuit of higher precision and accuracy so that ever more subtle effects might be…
The spatial information of sound plays a crucial role in various situations, ranging from daily activities to advanced engineering technologies. To fully utilize its potential, numerous research studies on spatial audio signal processing…
Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet…
We propose a method using a long short-term memory (LSTM) network to estimate the noise power spectral density (PSD) of single-channel audio signals represented in the short time Fourier transform (STFT) domain. An LSTM network common to…
Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…
This paper investigates the problem of estimating the spectral power parameters of random analog sources using numerical measurements acquired with minimum digitization complexity. Therefore, spectral analysis has to be performed with…
Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf…
Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…
This article addresses the measurement of the power spectrum of red noise processes at the lowest frequencies, where the minimum acquisition time is so long that it is impossible to average on a sequence of data record. Therefore, averaging…