Related papers: Learning to Denoise Historical Music
The following article introduces a new parametric synthesis algorithm for sound textures inspired by existing methods used for visual textures. Using a 2D Convolutional Neural Network (CNN), a sound signal is modified until the temporal…
Deep learning-based techniques for automatic dysarthric speech detection have recently attracted interest in the research community. State-of-the-art techniques typically learn neurotypical and dysarthric discriminative representations by…
Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient…
The use of deep learning to solve problems in literary arts has been a recent trend that has gained a lot of attention and automated generation of music has been an active area. This project deals with the generation of music using raw…
This study addresses the task of performing robust and reliable time-delay estimation in signals in noisy and reverberating environments. In contrast to the popular signal processing based methods, this paper proposes to transform the input…
Variations of deep neural networks such as convolutional neural network (CNN) have been successfully applied to image denoising. The goal is to automatically learn a mapping from a noisy image to a clean image given training data consisting…
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with…
Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the…
Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction…
Score-based stochastic denoising models have recently been demonstrated as powerful machine learning tools for conditional and unconditional image generation. The existing methods are based on a forward stochastic process wherein the…
Deep neural networks have shown promise for music audio signal processing applications, often surpassing prior approaches, particularly as end-to-end models in the waveform domain. Yet results to date have tended to be constrained by low…
Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually…
This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data…
Noise is usually regarded as adversarial to extract the effective dynamics from time series, such that the conventional data-driven approaches usually aim at learning the dynamics by mitigating the noisy effect. However, noise can have a…
Vocal dereverberation remains a challenging task in audio processing, particularly for real-time applications where both accuracy and efficiency are crucial. Traditional deep learning approaches often struggle to suppress reverberation…
Deep neural networks (DNNs) have achieved remarkable success across diverse domains, but their performance can be severely degraded by noisy or corrupted training data. Conventional noise mitigation methods often rely on explicit…
We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the…
The sparse layouts of radio interferometers result in an incomplete sampling of the sky in Fourier space which leads to artifacts in the reconstructed images. Cleaning these systematic effects is essential for the scientific use of…
Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency. In this work, we propose a…
FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand,…