English

Audio query-based music source separation

Sound 2019-08-20 v1 Audio and Speech Processing

Abstract

In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.

Keywords

Cite

@article{arxiv.1908.06593,
  title  = {Audio query-based music source separation},
  author = {Jie Hwan Lee and Hyeong-Seok Choi and Kyogu Lee},
  journal= {arXiv preprint arXiv:1908.06593},
  year   = {2019}
}

Comments

8 pages, 7 figures, Appearing in the proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR 2019) (camera-ready version)

R2 v1 2026-06-23T10:50:29.908Z