Related papers: User-guided Generative Source Separation

PromptSep: Generative Audio Separation via Multimodal Prompting

Recent breakthroughs in language-queried audio source separation (LASS) have shown that generative models can achieve higher separation audio quality than traditional masking-based approaches. However, two key limitations restrict their…

Sound · Computer Science 2025-11-07 Yutong Wen , Ke Chen , Prem Seetharaman , Oriol Nieto , Jiaqi Su , Rithesh Kumar , Minje Kim , Paris Smaragdis , Zeyu Jin , Justin Salamon

Diffusion-based Generative Speech Source Separation

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-03 Robin Scheibler , Youna Ji , Soo-Whan Chung , Jaeuk Byun , Soyeon Choe , Min-Seok Choi

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference…

Sound · Computer Science 2024-03-19 Giorgio Mariani , Irene Tallini , Emilian Postolache , Michele Mancusi , Luca Cosmo , Emanuele Rodolà

An Ensemble Approach to Music Source Separation: A Comparative Analysis of Conventional and Hierarchical Stem Separation

Music source separation (MSS) is a task that involves isolating individual sound sources, or stems, from mixed audio signals. This paper presents an ensemble approach to MSS, combining several state-of-the-art architectures to achieve…

Sound · Computer Science 2024-10-29 Saarth Vardhan , Pavani R Acharya , Samarth S Rao , Oorjitha Ratna Jasthi , S Natarajan

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the…

Sound · Computer Science 2024-03-19 Emilian Postolache , Giorgio Mariani , Luca Cosmo , Emmanouil Benetos , Emanuele Rodolà

Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis

Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in…

Sound · Computer Science 2023-09-22 Ben Maman , Johannes Zeitler , Meinard Müller , Amit H. Bermano

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Jiwan Hur , Dong-Jae Lee , Gyojin Han , Jaehyun Choi , Yunho Jeon , Junmo Kim

Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures

Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or transform the time-frequency representation of a…

Sound · Computer Science 2025-11-27 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

Multi-Source Music Generation with Latent Diffusion

Most music generation models directly generate a single music mixture. To allow for more flexible and controllable generation, the Multi-Source Diffusion Model (MSDM) has been proposed to model music as a mixture of multiple instrumental…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-18 Zhongweiyang Xu , Debottam Dutta , Yu-Lin Wei , Romit Roy Choudhury

Music Separation Enhancement with Generative Modeling

Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing…

Sound · Computer Science 2022-08-29 Noah Schaffer , Boaz Cogan , Ethan Manilow , Max Morrison , Prem Seetharaman , Bryan Pardo

Controllable Music Production with Diffusion Models and Guidance Gradients

We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include…

Sound · Computer Science 2023-12-06 Mark Levy , Bruno Di Giorgi , Floris Weers , Angelos Katharopoulos , Tom Nickson

Improving Music Source Separation with Diffusion and Consistency Refinement

In this work, we propose an approach to music source separation that uses a generative diffusion model as a last-stage refinement on top of a deterministic separator, progressively enhancing the separated sources through iterative…

Sound · Computer Science 2026-04-28 Tornike Karchkhadze , Mohammad Rasool Izadi , Shuo Zhang , Shlomo Dubnov

ZeroSep: Separate Anything in Audio with Zero Training

Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive,…

Sound · Computer Science 2025-05-30 Chao Huang , Yuesheng Ma , Junxuan Huang , Susan Liang , Yunlong Tang , Jing Bi , Wenqiang Liu , Nima Mesgarani , Chenliang Xu

MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction

We present MGE-LDM, a unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven source separation. Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint…

Sound · Computer Science 2025-10-21 Yunkee Chae , Kyogu Lee

Music Source Separation with Generative Flow

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art. However, such parallel data is often difficult to obtain, and it is cumbersome to adapt trained models to mixtures…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-30 Ge Zhu , Jordan Darefsky , Fei Jiang , Anton Selitskiy , Zhiyao Duan

Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model

Extracting individual elements from music mixtures is a valuable tool for music production and practice. While neural networks optimized to mask or transform mixture spectrograms into the individual source(s) have been the leading approach,…

Sound · Computer Science 2025-11-26 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

EDSep: An Effective Diffusion-Based Method for Speech Source Separation

Generative models have attracted considerable attention for speech separation tasks, and among these, diffusion-based methods are being explored. Despite the notable success of diffusion techniques in generation tasks, their adaptation to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-28 Jinwei Dong , Xinsheng Wang , Qirong Mao

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Jeffrey A. Chan-Santiago , Praveen Tirupattur , Gaurav Kumar Nayak , Gaowen Liu , Mubarak Shah

GASS: Generalizing Audio Source Separation with Large-scale Data

Universal source separation targets at separating the audio sources of an arbitrary mix, removing the constraint to operate on a specific domain like speech or music. Yet, the potential of universal source separation is limited because most…

Sound · Computer Science 2023-10-03 Jordi Pons , Xiaoyu Liu , Santiago Pascual , Joan Serrà

Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model

Similar to colorization in computer vision, instrument separation is to assign instrument labels (e.g. piano, guitar...) to notes from unlabeled mixtures which contain only performance information. To address the problem, we adopt diffusion…

Sound · Computer Science 2022-09-08 Sangjun Han , Hyeongrae Ihm , DaeHan Ahn , Woohyung Lim