Related papers: White-box Audio VST Effect Programming

SerumRNN: Step by Step Audio VST Effect Programming

Learning to program an audio production VST synthesizer is a time consuming process, usually obtained through inefficient trial and error and only mastered after years of experience. As an educational and creative tool for sound designers,…

Sound · Computer Science 2021-04-12 Christopher Mitcheltree , Hideki Koike

Differentiable Signal Processing With Black-Box Audio Effects

We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-12 Marco A. Martínez Ramírez , Oliver Wang , Paris Smaragdis , Nicholas J. Bryan

Style Transfer for Non-differentiable Audio Effects

Digital audio effects are widely used by audio engineers to alter the acoustic and temporal qualities of audio data. However, these effects can have a large number of parameters which can make them difficult to learn for beginners and…

Machine Learning · Computer Science 2023-10-02 Kieran Grant

Sample-Constrained Black Box Optimization for Audio Personalization

We consider the problem of personalizing audio to maximize user experience. Briefly, we aim to find a filter $h^*$, which applied to any music or speech, will maximize the user's satisfaction. This is a black-box optimization problem since…

Sound · Computer Science 2025-07-18 Rajalaxmi Rajagopalan , Yu-Lin Wei , Romit Roy Choudhury

Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects

Audio effects are extensively used at every stage of audio and music content creation. The majority of differentiable audio effects modeling approaches fall into the black-box or gray-box paradigms; and most models have been proposed and…

Sound · Computer Science 2025-02-21 Marco Comunità , Christian J. Steinmetz , Joshua D. Reiss

VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark

Speech audio in the wild is often processed by post-production effects, but existing speech datasets rarely provide precise annotations of effects and parameters, limiting systematic study. We introduce VoxEffects, a speech audio effects…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-15 Zhe Zhang , Yigitcan Özer , Junichi Yamagishi

FxSearcher: gradient-free text-driven audio transformation

Achieving diverse and high-quality audio transformations from text prompts remains challenging, as existing methods are fundamentally constrained by their reliance on a limited set of differentiable audio effects. This paper proposes…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Hojoon Ki , Jongsuk Kim , Minchan Kwon , Junmo Kim

Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin

There has been significant progress in the music generation technique utilizing deep learning. However, it is still hard for musicians and artists to use these techniques in their daily music-making practice. This paper proposes a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-06 Nao Tokui

DDSP-SFX: Acoustically-guided sound effects generation with differentiable digital signal processing

Controlling the variations of sound effects using neural audio synthesis models has been a difficult task. Differentiable digital signal processing (DDSP) provides a lightweight solution that achieves high-quality sound synthesis while…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-18 Yunyi Liu , Craig Jin , David Gunawan

Style Transfer of Audio Effects with Differentiable Signal Processing

We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording…

Sound · Computer Science 2022-07-19 Christian J. Steinmetz , Nicholas J. Bryan , Joshua D. Reiss

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

General-purpose audio representations have proven effective across diverse music information retrieval applications, yet their utility in intelligent music production remains limited by insufficient understanding of audio effects (Fx).…

Sound · Computer Science 2025-07-04 Yen-Tung Yeh , Junghyun Koo , Marco A. Martínez-Ramírez , Wei-Hsiang Liao , Yi-Hsuan Yang , Yuki Mitsufuji

Real-time implementation of vibrato transfer as an audio effect

An algorithm for deriving delay functions based on real examples of vibrato was recently introduced and can be used to perform a vibrato transfer, in which the vibrato pattern of a target signal is imparted onto an incoming sound using a…

Sound · Computer Science 2025-09-29 Jeremy Hyrkas

Generating sound effects with controllable variations is a challenging task, traditionally addressed using sophisticated physical models that require in-depth knowledge of signal processing parameters and algorithms. In the era of…

Sound · Computer Science 2024-12-30 Yunyi Liu , Craig Jin

Deep Performer: Score-to-Audio Music Performance Synthesis

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music…

Sound · Computer Science 2022-02-22 Hao-Wen Dong , Cong Zhou , Taylor Berg-Kirkpatrick , Julian McAuley

DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks

In contemporary popular music production, drum sound design is commonly performed by cumbersome browsing and processing of pre-recorded samples in sound libraries. One can also use specialized synthesis hardware, typically controlled…

Sound · Computer Science 2022-06-30 Javier Nistal , Cyran Aouameur , Ithan Velarde , Stefan Lattner

ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation

We present ReverbFX, a new room impulse response (RIR) dataset designed for singing voice dereverberation research. Unlike existing datasets based on real recorded RIRs, ReverbFX features a diverse collection of RIRs captured from various…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-14 Julius Richter , Till Svajda , Timo Gerkmann

LLM2Fx-Tools: Tool Calling For Music Post-Production

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs,…

Sound · Computer Science 2026-01-30 Seungheon Doh , Junghyun Koo , Marco A. Martínez-Ramírez , Woosung Choi , Wei-Hsiang Liao , Qiyu Wu , Juhan Nam , Yuki Mitsufuji

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

We introduce SoundVista, a method to generate the ambient sound of an arbitrary scene at novel viewpoints. Given a pre-acquired recording of the scene from sparsely distributed microphones, SoundVista can synthesize the sound of that scene…

Sound · Computer Science 2025-04-09 Mingfei Chen , Israel D. Gebru , Ishwarya Ananthabhotla , Christian Richardt , Dejan Markovic , Jake Sandakly , Steven Krenn , Todd Keebler , Eli Shlizerman , Alexander Richard

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the target speaker's voice while keeping text and prosody intact. VC is usually used in entertainment and speaking-aid systems, as well as applied for speech data…

Sound · Computer Science 2022-04-01 A. Kashkin , I. Karpukhin , S. Shishkin

Diff-MST: Differentiable Mixing Style Transfer

Mixing style transfer automates the generation of a multitrack mix for a given set of tracks by inferring production attributes from a reference song. However, existing systems for mixing style transfer are limited in that they often…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-15 Soumya Sai Vanka , Christian Steinmetz , Jean-Baptiste Rolland , Joshua Reiss , George Fazekas