Related papers: Multi-Task Audio Source Separation

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

In short video and live broadcasts, speech, singing voice, and background music often overlap and obscure each other. This complexity creates difficulties in structuring and recognizing the audio content, which may impair subsequent ASR and…

Sound · Computer Science 2024-04-18 Ye Bai , Chenxing Li , Hao Li , Yuanyuan Zhao , Xiaorui Wang

GASS: Generalizing Audio Source Separation with Large-scale Data

Universal source separation targets at separating the audio sources of an arbitrary mix, removing the constraint to operate on a specific domain like speech or music. Yet, the potential of universal source separation is limited because most…

Sound · Computer Science 2023-10-03 Jordi Pons , Xiaoyu Liu , Santiago Pascual , Joan Serrà

Task-Aware Unified Source Separation

Several attempts have been made to handle multiple source separation tasks such as speech enhancement, speech separation, sound event separation, music source separation (MSS), or cinematic audio source separation (CASS) with a single…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-01 Kohei Saijo , Janek Ebbers , François G. Germain , Gordon Wichern , Jonathan Le Roux

Jointly Detecting and Separating Singing Voice: A Multi-Task Approach

A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to…

Sound · Computer Science 2018-04-06 Daniel Stoller , Sebastian Ewert , Simon Dixon

CatNet: music source separation system with mix-audio augmentation

Music source separation (MSS) is the task of separating a music piece into individual sources, such as vocals and accompaniment. Recently, neural network based methods have been applied to address the MSS problem, and can be categorized…

Sound · Computer Science 2021-02-22 Xuchen Song , Qiuqiang Kong , Xingjian Du , Yuxuan Wang

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper,…

Sound · Computer Science 2015-10-02 Po-Sen Huang , Minje Kim , Mark Hasegawa-Johnson , Paris Smaragdis

Benchmarks and leaderboards for sound demixing tasks

Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas,…

Sound · Computer Science 2024-05-08 Roman Solovyev , Alexander Stempkovskiy , Tatiana Habruseva

Multi-scale Multi-band DenseNets for Audio Source Separation

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental…

Sound · Computer Science 2017-06-30 Naoya Takahashi , Yuki Mitsufuji

A Two-Stage Band-Split Mamba-2 Network For Music Separation

Music source separation (MSS) aims to separate mixed music into its distinct tracks, such as vocals, bass, drums, and more. MSS is considered to be a challenging audio separation task due to the complexity of music signals. Although the RNN…

Sound · Computer Science 2024-09-16 Jinglin Bai , Yuan Fang , Jiajie Wang , Xueliang Zhang

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Deep neural networks have become an indispensable technique for audio source separation (ASS). It was recently reported that a variant of CNN architecture called MMDenseNet was successfully employed to solve the ASS problem of estimating…

Sound · Computer Science 2018-05-30 Naoya Takahashi , Nabarun Goswami , Yuki Mitsufuji

Deep Remix: Remixing Musical Mixtures Using a Convolutional Deep Neural Network

Audio source separation is a difficult machine learning problem and performance is measured by comparing extracted signals with the component source signals. However, if separation is motivated by the ultimate goal of re-mixing then…

Sound · Computer Science 2015-05-05 Andrew J. R Simpson , Gerard Roma , Mark D. Plumbley

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music, and effects stems from their mixture, is a relatively new subtask of audio source separation. To date, only one publicly available dataset exists for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-27 Karn N. Watcharasupat , Chih-Wei Wu , Iroro Orife

MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment

Propelled by the breakthrough in deep generative models, audio-to-image generation has emerged as a pivotal cross-modal task that converts complex auditory signals into rich visual representations. However, previous works only focus on…

Sound · Computer Science 2025-12-11 Hao Zhou , Xiaobao Guo , Yuzhe Zhu , Adams Wai-Kin Kong

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Sound · Computer Science 2023-02-01 Kilian Schulze-Forster , Gaël Richard , Liam Kelley , Clement S. J. Doire , Roland Badeau

Learning to Separate Voices by Spatial Regions

We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4+$ sources with 2 microphones) they assume a known or fixed…

Sound · Computer Science 2022-07-18 Zhongweiyang Xu , Romit Roy Choudhury

Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator

Recent audio-visual generative models have made substantial progress in generating images from audio. However, existing approaches focus on generating images from single-class audio and fail to generate images from mixed audio. To address…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Minjae Kang , Martim Brandão

Multitask learning for instrument activation aware music source separation

Music source separation is a core task in music information retrieval which has seen a dramatic improvement in the past years. Nevertheless, most of the existing systems focus exclusively on the problem of source separation itself and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-04 Yun-Ning Hung , Alexander Lerch

SADDEL: Joint Speech Separation and Denoising Model based on Multitask Learning

Speech data collected in real-world scenarios often encounters two issues. First, multiple sources may exist simultaneously, and the number of sources may vary with time. Second, the existence of background noise in recording is inevitable.…

Sound · Computer Science 2020-05-21 Yuan-Kuei Wu , Chao-I Tuan , Hung-yi Lee , Yu Tsao

An Ensemble Approach to Music Source Separation: A Comparative Analysis of Conventional and Hierarchical Stem Separation

Music source separation (MSS) is a task that involves isolating individual sound sources, or stems, from mixed audio signals. This paper presents an ensemble approach to MSS, combining several state-of-the-art architectures to achieve…

Sound · Computer Science 2024-10-29 Saarth Vardhan , Pavani R Acharya , Samarth S Rao , Oorjitha Ratna Jasthi , S Natarajan

PromptSep: Generative Audio Separation via Multimodal Prompting

Recent breakthroughs in language-queried audio source separation (LASS) have shown that generative models can achieve higher separation audio quality than traditional masking-based approaches. However, two key limitations restrict their…

Sound · Computer Science 2025-11-07 Yutong Wen , Ke Chen , Prem Seetharaman , Oriol Nieto , Jiaqi Su , Rithesh Kumar , Minje Kim , Paris Smaragdis , Zeyu Jin , Justin Salamon