Related papers: MDCNN-SID: Multi-scale Dilated Convolution Network…
Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult…
We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four…
Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer identification is to identify the song belongs to which singer. However, there has been a tough problem in singer…
We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…
Deep neural networks with convolutional layers usually process the entire spectrogram of an audio signal with the same time-frequency resolutions, number of filters, and dimensionality reduction scale. According to the constant-Q transform,…
Music source separation involves a large input field to model a long-term dependence of an audio signal. Previous convolutional neural network (CNN)-based approaches address the large input field modeling using sequentially down- and…
Recent approaches for music source separation are almost exclusively based on deep neural networks, mostly employing recurrent neural networks (RNNs). Although RNNs are in many cases superior than other types of deep neural networks for…
Previous attempts at music artist classification use frame level audio features which summarize frequency content within short intervals of time. Comparatively, more recent music information retrieval tasks take advantage of temporal…
The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…
Professional vocalists modulate their voice timbre or pitch to make their vocal performance more expressive. Such fluctuations are called singing techniques. Automatic detection of singing techniques from audio tracks can be beneficial to…
Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and…
Previous approaches in singer identification have used one of monophonic vocal tracks or mixed tracks containing multiple instruments, leaving a semantic gap between these two domains of audio. In this paper, we present a system to learn a…
In this paper, we study the issue of automatic singer identification (SID) in popular music recordings, which aims to recognize who sang a given piece of song. The main challenge for this investigation lies in the fact that a singer's…
A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features…
Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a…
Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging. In this work, we propose implementing intermediate connections to the CNN…
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…
Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer…
State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features. Recent studies attempted to extract speaker embeddings directly from raw waveforms and have shown…