Related papers: MDCNN-SID: Multi-scale Dilated Convolution Network…

Singer Identification Using Convolutional Acoustic Motif Embeddings

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-04 Aitor Arronte Alvarez , Francisco Gomez-Martin

A Multi-modal Deep Neural Network approach to Bird-song identification

We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four…

Sound · Computer Science 2018-11-13 Botond Fazeka , Alexander Schindler , Thomas Lidy , Andreas Rauber

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer identification is to identify the song belongs to which singer. However, there has been a tough problem in singer…

Sound · Computer Science 2022-05-25 Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-23 Kazuhiro Nakamura , Shinji Takaki , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

Deep neural networks with convolutional layers usually process the entire spectrogram of an audio signal with the same time-frequency resolutions, number of filters, and dimensionality reduction scale. According to the constant-Q transform,…

Sound · Computer Science 2019-10-22 Emad M. Grais , Fei Zhao , Mark D. Plumbley

D3Net: Densely connected multidilated DenseNet for music source separation

Music source separation involves a large input field to model a long-term dependence of an audio signal. Previous convolutional neural network (CNN)-based approaches address the large input field modeling using sequentially down- and…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-30 Naoya Takahashi , Yuki Mitsufuji

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

Recent approaches for music source separation are almost exclusively based on deep neural networks, mostly employing recurrent neural networks (RNNs). Although RNNs are in many cases superior than other types of deep neural networks for…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-08 Pyry Pyykkönen , Styliannos I. Mimilakis , Konstantinos Drossos , Tuomas Virtanen

Music Artist Classification with Convolutional Recurrent Neural Networks

Previous attempts at music artist classification use frame level audio features which summarize frequency content within short intervals of time. Comparatively, more recent music information retrieval tasks take advantage of temporal…

Sound · Computer Science 2019-03-18 Zain Nasrullah , Yue Zhao

Singing voice synthesis based on convolutional neural networks

The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-26 Kazuhiro Nakamura , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

PrimaDNN': A Characteristics-aware DNN Customization for Singing Technique Detection

Professional vocalists modulate their voice timbre or pitch to make their vocal performance more expressive. Such fluctuations are called singing techniques. Automatic detection of singing techniques from audio tracks can be beneficial to…

Sound · Computer Science 2023-06-27 Yuya Yamamoto , Juhan Nam , Hiroko Terasawa

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and…

Multimedia · Computer Science 2019-11-04 Zhesong Yu , Xiaoshuo Xu , Xiaoou Chen , Deshun Yang

Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

Previous approaches in singer identification have used one of monophonic vocal tracks or mixed tracks containing multiple instruments, leaving a semantic gap between these two domains of audio. In this paper, we present a system to learn a…

Sound · Computer Science 2019-06-27 Kyungyun Lee , Juhan Nam

Singer Identification Using Deep Timbre Feature Learning with KNN-Net

In this paper, we study the issue of automatic singer identification (SID) in popular music recordings, which aims to recognize who sang a given piece of song. The main challenge for this investigation lies in the fact that a singer's…

Sound · Computer Science 2021-02-23 Xulong Zhang , Jiale Qian , Yi Yu , Yifu Sun , Wei Li

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features…

Sound · Computer Science 2015-12-24 Taejin Park , Taejin Lee

Dynamic Multi-scale Convolution for Dialect Identification

Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a…

Computation and Language · Computer Science 2021-08-18 Tianlong Kong , Shouyi Yin , Dawei Zhang , Wang Geng , Xin Wang , Dandan Song , Jinwen Huang , Huiyu Shi , Xiaorui Wang

Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging. In this work, we propose implementing intermediate connections to the CNN…

Sound · Computer Science 2019-06-18 Nima Hamidi , Mohsen Vahidzadeh , Stephen Baek

Speaker Recognition from Raw Waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Singer Identity Representation Learning using Self-Supervised Techniques

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer…

Sound · Computer Science 2024-01-11 Bernardo Torres , Stefan Lattner , Gaël Richard

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features. Recent studies attempted to extract speaker embeddings directly from raw waveforms and have shown…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-10 Ge Zhu , Fei Jiang , Zhiyao Duan