Related papers: Unsupervised Audio Source Separation using Generat…

Unsupervised Source Separation via Bayesian Inference in the Latent Domain

State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling resources. On the other hand, approaches for training these models without any direct supervision are…

Machine Learning · Computer Science 2022-04-04 Michele Mancusi , Emilian Postolache , Giorgio Mariani , Marco Fumero , Andrea Santilli , Luca Cosmo , Emanuele Rodolà

Unsupervised Source Separation By Steering Pretrained Music Models

We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a…

Sound · Computer Science 2021-10-26 Ethan Manilow , Patrick O'Reilly , Prem Seetharaman , Bryan Pardo

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Sound · Computer Science 2023-02-01 Kilian Schulze-Forster , Gaël Richard , Liam Kelley , Clement S. J. Doire , Roland Badeau

Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

Single-channel audio separation aims to separate individual sources from a single-channel mixture. Most existing methods rely on supervised learning with synthetically generated paired data. However, obtaining high-quality paired data in…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-24 Runwu Shi , Chang Li , Jiang Wang , Rui Zhang , Nabeela Khan , Benjamin Yen , Takeshi Ashizawa , Kazuhiro Nakadai

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data…

Machine Learning · Computer Science 2018-04-09 Daniel Stoller , Sebastian Ewert , Simon Dixon

Music Source Separation with Generative Flow

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art. However, such parallel data is often difficult to obtain, and it is cumbersome to adapt trained models to mixtures…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-30 Ge Zhu , Jordan Darefsky , Fei Jiang , Anton Selitskiy , Zhiyao Duan

Unsupervised Composable Representations for Audio

Current generative models are able to generate high-quality artefacts but have been shown to struggle with compositional reasoning, which can be defined as the ability to generate complex structures from simpler elements. In this paper, we…

Machine Learning · Computer Science 2024-08-20 Giovanni Bindi , Philippe Esling

Source Separation with Deep Generative Priors

Despite substantial progress in signal source separation, results for richly structured data continue to contain perceptible artifacts. In contrast, recent deep generative models can produce authentic samples in a variety of domains that…

Machine Learning · Computer Science 2020-09-22 Vivek Jayaram , John Thickstun

Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain

Gaussian process (GP) audio source separation is a time-domain approach that circumvents the inherent phase approximation issue of spectrogram based methods. Furthermore, through its kernel, GPs elegantly incorporate prior knowledge about…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-22 Pablo A. Alvarado , Mauricio A. Álvarez , Dan Stowell

On the Design of Deep Priors for Unsupervised Audio Restoration

Unsupervised deep learning methods for solving audio restoration problems extensively rely on carefully tailored neural architectures that carry strong inductive biases for defining priors in the time or spectral domain. In this context,…

Sound · Computer Science 2021-04-16 Vivek Sivaraman Narayanaswamy , Jayaraman J. Thiagarajan , Andreas Spanias

Unsupervised Speech Enhancement using Data-defined Priors

The majority of deep learning-based speech enhancement methods require paired clean-noisy speech data. Collecting such data at scale in real-world conditions is infeasible, which has led the community to rely on synthetically generated…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Dominik Klement , Matthew Maciejewski , Sanjeev Khudanpur , Jan Černocký , Lukáš Burget

Improving Universal Sound Separation Using Sound Classification

Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of…

Sound · Computer Science 2021-05-14 Efthymios Tzinis , Scott Wisdom , John R. Hershey , Aren Jansen , Daniel P. W. Ellis

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning

Separating audio mixtures into individual instrument tracks has been a long standing challenging task. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning. Specifically, our loss…

Sound · Computer Science 2018-05-18 Ning Zhang , Junchi Yan , Yuchen Zhou

Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Runwu Shi , Kai Li , Chang Li , Jiang Wang , Sihan Tan , Kazuhiro Nakadai

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Sound · Computer Science 2021-09-16 Eduardo Fonseca , Aren Jansen , Daniel P. W. Ellis , Scott Wisdom , Marco Tagliasacchi , John R. Hershey , Manoj Plakal , Shawn Hershey , R. Channing Moore , Xavier Serra

Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision

While there has been much recent progress using deep learning techniques to separate speech and music audio signals, these systems typically require large collections of isolated sources during the training process. When extending audio…

Sound · Computer Science 2020-09-01 Fatemeh Pishdadian , Gordon Wichern , Jonathan Le Roux

Score-based Source Separation with Applications to Digital Communication Signals

We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by…

Machine Learning · Computer Science 2024-01-18 Tejas Jayashankar , Gary C. F. Lee , Alejandro Lancho , Amir Weiss , Yury Polyanskiy , Gregory W. Wornell

Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model

Extracting individual elements from music mixtures is a valuable tool for music production and practice. While neural networks optimized to mask or transform mixture spectrograms into the individual source(s) have been the leading approach,…

Sound · Computer Science 2025-11-26 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

ZeroSep: Separate Anything in Audio with Zero Training

Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive,…

Sound · Computer Science 2025-05-30 Chao Huang , Yuesheng Ma , Junxuan Huang , Susan Liang , Yunlong Tang , Jing Bi , Wenqiang Liu , Nima Mesgarani , Chenliang Xu

Semantic Grouping Network for Audio Source Separation

Recently, audio-visual separation approaches have taken advantage of the natural synchronization between the two modalities to boost audio source separation performance. They extracted high-level semantics from visual inputs as the guidance…

Sound · Computer Science 2024-07-08 Shentong Mo , Yapeng Tian