English
Related papers

Related papers: Vocal melody extraction using patch-based CNN

200 papers

Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of…

Sound · Computer Science 2022-02-03 Gurunath Reddy M , K. Sreenivasa Rao , Partha Pratim Das

A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features…

Sound · Computer Science 2015-12-24 Taejin Park , Taejin Lee

Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-19 Tsung-Han Hsieh , Li Su , Yi-Hsuan Yang

In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two…

Sound · Computer Science 2023-08-08 Keren Shao , Ke Chen , Taylor Berg-Kirkpatrick , Shlomo Dubnov

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to…

Machine Learning · Computer Science 2017-05-31 Emre Çakır , Giambattista Parascandolo , Toni Heittola , Heikki Huttunen , Tuomas Virtanen

Extraction of the predominant pitch from polyphonic audio is one of the fundamental tasks in the field of music information retrieval and computational musicology. To accomplish this task using machine learning, a large amount of labeled…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-07 Kavya Ranjan Saxena , Vipul Arora

Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these…

Multimedia · Computer Science 2017-04-20 Xin Liu , Qingcai Chen , Xiangping Wu , Yan Liu , Yang Liu

We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that…

Sound · Computer Science 2019-10-29 Miguel A. Román , Antonio Pertusa , Jorge Calvo-Zaragoza

This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN…

Sound · Computer Science 2019-06-18 Anders Elowsson , Anders Friberg

Melody estimation or melody extraction refers to the extraction of the primary or fundamental dominant frequency in a melody. This sequence of frequencies obtained represents the pitch of the dominant melodic line from recorded music audio…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Udhav Gupta , Avi , Bhavesh Jain

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-23 Kazuhiro Nakamura , Shinji Takaki , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

This paper explores the application of Convolutional Neural Networks CNNs for classifying emotions in speech through Mel Spectrogram representations of audio files. Traditional methods such as Gaussian Mixture Models and Hidden Markov…

Sound · Computer Science 2025-03-26 Niketa Penumajji

In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score -- i.e. without listening…

Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and…

Sound · Computer Science 2025-09-16 Friedrich Wolf-Monheim

This study explores the design and application of Complex-Valued Convolutional Neural Networks (CVCNNs) in audio signal processing, with a focus on preserving and utilizing phase information often neglected in real-valued networks. We begin…

Machine Learning · Computer Science 2025-10-14 Naman Agrawal

Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Muhammed PV Shifas , Santelli Claudio , Vassilis Tsiaras , Yannis Stylianou

Automated melodic phrase detection and segmentation is a classical task in content-based music information retrieval and also the key towards automated music structure analysis. However, traditional methods still cannot satisfy practical…

Machine Learning · Computer Science 2018-11-15 Yixing Guan , Jinyu Zhao , Yiqin Qiu , Zheng Zhang , Gus Xia

The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-26 Kazuhiro Nakamura , Kei Hashimoto , Keiichiro Oura , Yoshihiko Nankaku , Keiichi Tokuda

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time…

Sound · Computer Science 2016-12-09 Naoya Takahashi , Michael Gygli , Beat Pfister , Luc Van Gool

Polyphonic sound event detection (polyphonic SED) is an interesting but challenging task due to the concurrence of multiple sound events. Recently, SED methods based on convolutional neural networks (CNN) and recurrent neural networks (RNN)…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-24 Yaming Liu , Jian Tang , Yan Song , Lirong Dai
‹ Prev 1 2 3 10 Next ›