Related papers: Task-Aware Unified Source Separation

FasTUSS: Faster Task-Aware Unified Source Separation

Time-Frequency (TF) dual-path models are currently among the best performing audio source separation network architectures, achieving state-of-the-art performance in speech enhancement, music source separation, and cinematic audio source…

Sound · Computer Science 2025-07-16 Francesco Paissan , Gordon Wichern , Yoshiki Masuyama , Ryo Aihara , François G. Germain , Kohei Saijo , Jonathan Le Roux

GASS: Generalizing Audio Source Separation with Large-scale Data

Universal source separation targets at separating the audio sources of an arbitrary mix, removing the constraint to operate on a specific domain like speech or music. Yet, the potential of universal source separation is limited because most…

Sound · Computer Science 2023-10-03 Jordi Pons , Xiaoyu Liu , Santiago Pascual , Joan Serrà

Multi-Task Audio Source Separation

The audio source separation tasks, such as speech enhancement, speech separation, and music source separation, have achieved impressive performance in recent studies. The powerful modeling capabilities of deep neural networks give us hope…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-15 Lu Zhang , Chenxing Li , Feng Deng , Xiaorui Wang

TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

Source separation is a fundamental task in speech, music, and audio processing, and it also provides cleaner and larger data for training generative models. However, improving separation performance in practice often depends on increasingly…

Sound · Computer Science 2025-10-15 Yongsheng Feng , Yuetonghui Xu , Jiehui Luo , Hongjia Liu , Xiaobing Li , Feng Yu , Wei Li

Universal Source Separation with Weakly Labelled Data

Universal source separation (USS) is a fundamental research task for computational auditory scene analysis, which aims to separate mono recordings into individual source tracks. There are three potential challenges awaiting the solution to…

Sound · Computer Science 2023-05-15 Qiuqiang Kong , Ke Chen , Haohe Liu , Xingjian Du , Taylor Berg-Kirkpatrick , Shlomo Dubnov , Mark D. Plumbley

USE: A Unified Model for Universal Sound Separation and Extraction

Sound separation (SS) and target sound extraction (TSE) are fundamental techniques for addressing complex acoustic scenarios. While existing SS methods struggle with determining the unknown number of sound sources, TSE approaches require…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-25 Hongyu Wang , Chenda Li , Xin Zhou , Shuai Wang , Yanmin Qian

What's All the FUSS About Free Universal Sound Separation Data?

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio…

Sound · Computer Science 2020-11-03 Scott Wisdom , Hakan Erdogan , Daniel Ellis , Romain Serizel , Nicolas Turpault , Eduardo Fonseca , Justin Salamon , Prem Seetharaman , John Hershey

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

In short video and live broadcasts, speech, singing voice, and background music often overlap and obscure each other. This complexity creates difficulties in structuring and recognizing the audio content, which may impair subsequent ASR and…

Sound · Computer Science 2024-04-18 Ye Bai , Chenxing Li , Hao Li , Yuanyuan Zhao , Xiaorui Wang

Self-Guided Target Sound Extraction and Classification Through Universal Sound Separation Model and Multiple Clues

This paper introduces a multi-stage self-directed framework designed to address the spatial semantic segmentation of sound scene (S5) task in the DCASE 2025 Task 4 challenge. This framework integrates models focused on three distinct tasks:…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-18 Younghoo Kwon , Dongheon Lee , Dohwan Kim , Jung-Woo Choi

Multitask learning for instrument activation aware music source separation

Music source separation is a core task in music information retrieval which has seen a dramatic improvement in the past years. Nevertheless, most of the existing systems focus exclusively on the problem of source separation itself and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-04 Yun-Ning Hung , Alexander Lerch

A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

We propose a unified model for three inter-related tasks: 1) to \textit{separate} individual sound sources from a mixed music audio, 2) to \textit{transcribe} each sound source to MIDI notes, and 3) to\textit{ synthesize} new pieces based…

Sound · Computer Science 2021-08-10 Liwei Lin , Qiuqiang Kong , Junyan Jiang , Gus Xia

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting. This is…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-13 Kohei Saijo , Wangyou Zhang , Zhong-Qiu Wang , Shinji Watanabe , Tetsunori Kobayashi , Tetsuji Ogawa

Audio Prompt Tuning for Universal Sound Separation

Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-01 Yuzhuo Liu , Xubo Liu , Yan Zhao , Yuanyuan Wang , Rui Xia , Pingchuan Tain , Yuxuan Wang

Spatial Aware Multi-Task Learning Based Speech Separation

During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not…

Sound · Computer Science 2022-07-22 Wei Sun , Mei Wang , Lili Qiu

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-07 Junqi Zhao , Xubo Liu , Jinzheng Zhao , Yi Yuan , Qiuqiang Kong , Mark D. Plumbley , Wenwu Wang

Cinematic Audio Source Separation Using Visual Cues

Cinematic Audio Source Separation (CASS) aims to decompose mixed film audio into speech, music, and sound effects, enabling applications like dubbing and remastering. Existing CASS approaches are audio-only, overlooking the inherent…

Multimedia · Computer Science 2026-03-30 Kang Zhang , Suyeon Lee , Arda Senocak , Joon Son Chung

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Sound · Computer Science 2023-02-01 Kilian Schulze-Forster , Gaël Richard , Liam Kelley , Clement S. J. Doire , Roland Badeau

Music Source Separation with Generative Flow

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art. However, such parallel data is often difficult to obtain, and it is cumbersome to adapt trained models to mixtures…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-30 Ge Zhu , Jordan Darefsky , Fei Jiang , Anton Selitskiy , Zhiyao Duan

Preserving background sound in noise-robust voice conversion via multi-task learning

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-08 Jixun Yao , Yi Lei , Qing Wang , Pengcheng Guo , Ziqian Ning , Lei Xie , Hai Li , Junhui Liu , Danming Xie

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models. Motivated by this, we ask: can we build a single…

Computation and Language · Computer Science 2024-04-04 Siddhant Arora , Hayato Futami , Jee-weon Jung , Yifan Peng , Roshan Sharma , Yosuke Kashiwagi , Emiru Tsunoo , Karen Livescu , Shinji Watanabe