Related papers: Vocal melody extraction using patch-based CNN

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of…

Sound · Computer Science 2022-02-03 Gurunath Reddy M , K. Sreenivasa Rao , Partha Pratim Das

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features…

Sound · Computer Science 2015-12-24 Taejin Park , Taejin Lee

A Streamlined Encoder/Decoder Architecture for Melody Extraction

Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-19 Tsung-Han Hsieh , Li Su , Yi-Hsuan Yang

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two…

Sound · Computer Science 2023-08-08 Keren Shao , Ke Chen , Taylor Berg-Kirkpatrick , Shlomo Dubnov

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to…

Machine Learning · Computer Science 2017-05-31 Emre Çakır , Giambattista Parascandolo , Toni Heittola , Heikki Huttunen , Tuomas Virtanen

Deep domain adaptation for polyphonic melody extraction

Extraction of the predominant pitch from polyphonic audio is one of the fundamental tasks in the field of music information retrieval and computational musicology. To accomplish this task using machine learning, a large amount of labeled…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-07 Kavya Ranjan Saxena , Vipul Arora

CNN based music emotion classification

Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these…

Multimedia · Computer Science 2017-04-20 Xin Liu , Qingcai Chen , Xiangping Wu , Yan Liu , Yang Liu

A holistic approach to polyphonic music transcription with neural networks

We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that…

Sound · Computer Science 2019-10-29 Miguel A. Román , Antonio Pertusa , Jorge Calvo-Zaragoza

Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN

This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN…

Sound · Computer Science 2019-06-18 Anders Elowsson , Anders Friberg

Student-t Networks for Melody Estimation

Melody estimation or melody extraction refers to the extraction of the primary or fundamental dominant frequency in a melody. This sequence of frequencies obtained represents the pitch of the dominant melodic line from recorded music audio…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Udhav Gupta , Avi , Bhavesh Jain

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-23 Kazuhiro Nakamura , Shinji Takaki , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms

This paper explores the application of Convolutional Neural Networks CNNs for classifying emotions in speech through Mel Spectrogram representations of audio files. Traditional methods such as Gaussian Mixture Models and Hidden Markov…

Sound · Computer Science 2025-03-26 Niketa Penumajji

A Convolutional Approach to Melody Line Identification in Symbolic Scores

In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score -- i.e. without listening…

Sound · Computer Science 2021-12-28 Federico Simonetta , Carlos Cancino-Chacón , Stavros Ntalampiras , Gerhard Widmer

Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and…

Sound · Computer Science 2025-09-16 Friedrich Wolf-Monheim

Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications

This study explores the design and application of Complex-Valued Convolutional Neural Networks (CVCNNs) in audio signal processing, with a focus on preserving and utilizing phase information often neglected in real-valued networks. We begin…

Machine Learning · Computer Science 2025-10-14 Naman Agrawal

A fully recurrent feature extraction for single channel speech enhancement

Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Muhammed PV Shifas , Santelli Claudio , Vassilis Tsiaras , Yannis Stylianou

Melodic Phrase Segmentation By Deep Neural Networks

Automated melodic phrase detection and segmentation is a classical task in content-based music information retrieval and also the key towards automated music structure analysis. However, traditional methods still cannot satisfy practical…

Machine Learning · Computer Science 2018-11-15 Yixing Guan , Jinyu Zhao , Yiqin Qiu , Zheng Zhang , Gus Xia

Singing voice synthesis based on convolutional neural networks

The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-26 Kazuhiro Nakamura , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time…

Sound · Computer Science 2016-12-09 Naoya Takahashi , Michael Gygli , Beat Pfister , Luc Van Gool

A Capsule based Approach for Polyphonic Sound Event Detection

Polyphonic sound event detection (polyphonic SED) is an interesting but challenging task due to the concurrence of multiple sound events. Recently, SED methods based on convolutional neural networks (CNN) and recurrent neural networks (RNN)…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-24 Yaming Liu , Jian Tang , Yan Song , Lirong Dai