English
Related papers

Related papers: Unsupervised Composable Representations for Audio

200 papers

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-29 Vivek Narayanaswamy , Jayaraman J. Thiagarajan , Rushil Anirudh , Andreas Spanias

Extracting individual elements from music mixtures is a valuable tool for music production and practice. While neural networks optimized to mask or transform mixture spectrograms into the individual source(s) have been the leading approach,…

Sound · Computer Science 2025-11-26 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-02 Stylianos I. Mimilakis , Konstantinos Drossos , Gerald Schuller

In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Yochai Yemini , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

Diffusion models have emerged as powerful deep generative techniques, producing high-quality and diverse samples in applications in various domains including audio. While existing reviews provide overviews, there remains limited in-depth…

Sound · Computer Science 2026-01-16 Ge Zhu , Yutong Wen , Zhiyao Duan

We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include…

Sound · Computer Science 2023-12-06 Mark Levy , Bruno Di Giorgi , Floris Weers , Angelos Katharopoulos , Tom Nickson

Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or transform the time-frequency representation of a…

Sound · Computer Science 2025-11-27 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels. The core idea in self supervised contrastive learning is to map an audio…

Sound · Computer Science 2021-03-18 Prateek Verma , Julius Smith

We propose DAVIS, a Diffusion-based Audio-VIsual Separation framework that solves the audio-visual sound source separation task through generative learning. Existing methods typically frame sound separation as a mask-based regression…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Chao Huang , Susan Liang , Yapeng Tian , Anurag Kumar , Chenliang Xu

In this paper, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framework where the model can be updated without relying on labeled data. For this purpose, we propose…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-11 Zhepei Wang , Cem Subakan , Xilin Jiang , Junkai Wu , Efthymios Tzinis , Mirco Ravanelli , Paris Smaragdis

This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with…

Sound · Computer Science 2023-07-21 Svetlana Pavlova

Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Runwu Shi , Kai Li , Chang Li , Jiang Wang , Sihan Tan , Kazuhiro Nakadai

Single-channel audio separation aims to separate individual sources from a single-channel mixture. Most existing methods rely on supervised learning with synthetically generated paired data. However, obtaining high-quality paired data in…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-24 Runwu Shi , Chang Li , Jiang Wang , Rui Zhang , Nabeela Khan , Benjamin Yen , Takeshi Ashizawa , Kazuhiro Nakadai

Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-25 Eloi Moliner , Michal Švento , Alec Wright , Lauri Juvela , Pavel Rajmic , Vesa Välimäki

We propose DAVIS, a Diffusion-based Audio-VIsual Separation framework that solves the audio-visual sound source separation task through generative learning. Existing methods typically frame sound separation as a mask-based regression…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Chao Huang , Susan Liang , Yapeng Tian , Anurag Kumar , Chenliang Xu

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored.…

We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a…

Sound · Computer Science 2021-10-26 Ethan Manilow , Patrick O'Reilly , Prem Seetharaman , Bryan Pardo
‹ Prev 1 2 3 10 Next ›