Related papers: Point Cloud Audio Processing
Since the evolution of digital computers, the storage of data has always been in terms of discrete bits that can store values of either 1 or 0. Hence, all computer programs (such as MATLAB), convert any input continuous signal into a…
The short-time Fourier transform (STFT) represents a window of audio samples as a set of complex coefficients. These are advantageously viewed as magnitudes and phases and the overall distribution of phases is very often assumed to be…
Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio…
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on…
Image Representation learning via input reconstruction is a common technique in machine learning for generating representations that can be effectively utilized by arbitrary downstream tasks. A well-established approach is using…
The short-time Fourier transform (STFT) usually computes the same number of frequency components as the frame length while overlapping adjacent time frames by more than half. As a result, the number of components of a spectrogram matrix…
We present STFTCodec, a novel spectral-based neural audio codec that efficiently compresses audio using Short-Time Fourier Transform (STFT). Unlike waveform-based approaches that require large model capacity and substantial memory…
Point clouds can be regarded as discrete samples of smooth manifolds and are typically analyzed via the eigenfunctions of the Laplace-Beltrami operator. This paper extends manifold spectral analysis to the fractional domain, enabling…
Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are…
This paper addresses the problem of under-determinded speech source separation from multichannel microphone singals, i.e. the convolutive mixtures of multiple sources. The time-domain signals are first transformed to the short-time Fourier…
Three-dimensional point clouds can be viewed as discrete samples of smooth manifolds, allowing spectral analysis using the Laplace-Beltrami operator (LBO). However, the traditional point cloud manifold harmonic transform (PMHT) is limited…
Despite the advancements in cutting-edge technologies, audio signal processing continues to pose challenges and lacks the precision of a human speech processing system. To address these challenges, we propose a novel approach to simplify…
In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier…
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this…
For audio source separation applications, it is common to estimate the magnitude of the short-time Fourier transform (STFT) of each source. In order to further synthesizing time-domain signals, it is necessary to recover the phase of the…
Point cloud filtering is a fundamental problem in geometry modeling and processing. Despite of significant advancement in recent years, the existing methods still suffer from two issues: 1) they are either designed without preserving sharp…
In this report we describe an ongoing line of research for solving single-channel source separation problems. Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency…
Given the rapid development of 3D scanners, point clouds are becoming popular in AI-driven machines. However, point cloud data is inherently sparse and irregular, causing significant difficulties for machine perception. In this work, we…
Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…
Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition -- the use of machines to understand sounds. They feature several advantages over traditional…