Related papers: Vocal melody extraction using patch-based CNN
Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of…
A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features…
Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing…
In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two…
Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to…
Extraction of the predominant pitch from polyphonic audio is one of the fundamental tasks in the field of music information retrieval and computational musicology. To accomplish this task using machine learning, a large amount of labeled…
Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these…
We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that…
This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN…
Melody estimation or melody extraction refers to the extraction of the primary or fundamental dominant frequency in a melody. This sequence of frequencies obtained represents the pitch of the dominant melodic line from recorded music audio…
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…
This paper explores the application of Convolutional Neural Networks CNNs for classifying emotions in speech through Mel Spectrogram representations of audio files. Traditional methods such as Gaussian Mixture Models and Hidden Markov…
In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score -- i.e. without listening…
Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and…
This study explores the design and application of Complex-Valued Convolutional Neural Networks (CVCNNs) in audio signal processing, with a focus on preserving and utilizing phase information often neglected in real-valued networks. We begin…
Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the…
Automated melodic phrase detection and segmentation is a classical task in content-based music information retrieval and also the key towards automated music structure analysis. However, traditional methods still cannot satisfy practical…
The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…
We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time…
Polyphonic sound event detection (polyphonic SED) is an interesting but challenging task due to the concurrence of multiple sound events. Recently, SED methods based on convolutional neural networks (CNN) and recurrent neural networks (RNN)…