Related papers: Generating Sample-Based Musical Instruments Using …

InstrumentGen: Generating Sample-Based Musical Instruments From Text

We introduce the text-to-instrument task, which aims at generating sample-based musical instruments based on textual prompts. Accordingly, we propose InstrumentGen, a model that extends a text-prompted generative audio framework to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-09 Shahan Nercessian , Johannes Imort

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

Generative models have thrived in computer vision, enabling unprecedented image processes. Yet the results in audio remain less advanced. Our project targets real-time sound synthesis from a reduced set of high-level parameters, including…

Sound · Computer Science 2019-06-25 Adrien Bitton , Philippe Esling , Antoine Caillon , Martin Fouilleul

Sampling Variations of Lead Sheets

Machine-learning techniques have been recently used with spectacular results to generate artefacts such as music or text. However, these techniques are still unable to capture and generate artefacts that are convincingly structured. In this…

Artificial Intelligence · Computer Science 2017-03-03 Pierre Roy , Alexandre Papadopoulos , François Pachet

Audio Conditioning for Music Generation via Discrete Bottleneck Features

While most music generation models use textual or parametric conditioning (e.g. tempo, harmony, musical genre), we propose to condition a language model based music generation system with audio input. Our exploration involves two distinct…

Sound · Computer Science 2024-07-31 Simon Rouard , Yossi Adi , Jade Copet , Axel Roebel , Alexandre Défossez

AudioGen: Textually Guided Audio Generation

We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs. AudioGen operates…

Sound · Computer Science 2023-03-07 Felix Kreuk , Gabriel Synnaeve , Adam Polyak , Uriel Singer , Alexandre Défossez , Jade Copet , Devi Parikh , Yaniv Taigman , Yossi Adi

Probing Audio-Generation Capabilities of Text-Based Language Models

How does textual representation of audio relate to the Large Language Model's (LLMs) learning about the audio world? This research investigates the extent to which LLMs can be prompted to generate audio, despite their primary training in…

Sound · Computer Science 2025-06-03 Arjun Prasaath Anbazhagan , Parteek Kumar , Ujjwal Kaur , Aslihan Akalin , Kevin Zhu , Sean O'Brien

Controllable Music Production with Diffusion Models and Guidance Gradients

We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include…

Sound · Computer Science 2023-12-06 Mark Levy , Bruno Di Giorgi , Floris Weers , Angelos Katharopoulos , Tom Nickson

Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised…

Sound · Computer Science 2024-01-31 Jade Copet , Felix Kreuk , Itai Gat , Tal Remez , David Kant , Gabriel Synnaeve , Yossi Adi , Alexandre Défossez

SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints

Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data…

Sound · Computer Science 2024-09-11 Haonan Chen , Jordan B. L. Smith , Janne Spijkervet , Ju-Chiang Wang , Pei Zou , Bochen Li , Qiuqiang Kong , Xingjian Du

Generation of Musical Timbres using a Text-Guided Diffusion Model

In recent years, text-to-audio systems have achieved remarkable success, enabling the generation of complete audio segments directly from text descriptions. While these systems also facilitate music creation, the element of human creativity…

Sound · Computer Science 2025-04-15 Weixuan Yuan , Qadeer Khan , Vladimir Golkov

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-07 Xubo Liu , Turab Iqbal , Jinzheng Zhao , Qiushi Huang , Mark D. Plumbley , Wenwu Wang

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing…

Sound · Computer Science 2025-10-07 Christian Limberg , Fares Schulz , Zhe Zhang , Stefan Weinzierl

Difficulty-Aware Score Generation for Piano Sight-Reading

Adapting learning materials to the level of skill of a student is important in education. In the context of music training, one essential ability is sight-reading -- playing unfamiliar scores at first sight -- which benefits from…

Sound · Computer Science 2025-09-23 Pedro Ramoneda , Masahiro Suzuki , Akira Maezawa , Xavier Serra

Computer Assisted Composition with Recurrent Neural Networks

Sequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous…

Artificial Intelligence · Computer Science 2017-10-03 Christian Walder , Dongwoo Kim

Embedding Alignment in Code Generation for Audio

LLM-powered code generation has the potential to revolutionize creative coding endeavors, such as live-coding, by enabling users to focus on structural motifs over syntactic details. In such domains, when prompting an LLM, users may benefit…

Multimedia · Computer Science 2025-09-25 Sam Kouteili , Hiren Madhu , George Typaldos , Mark Santolucito

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited…

Sound · Computer Science 2023-08-04 Ke Chen , Yusong Wu , Haohe Liu , Marianna Nezhurina , Taylor Berg-Kirkpatrick , Shlomo Dubnov

Setting the rhythm scene: deep learning-based drum loop generation from arbitrary language cues

Generative artificial intelligence models can be a valuable aid to music composition and live performance, both to aid the professional musician and to help democratize the music creation process for hobbyists. Here we present a novel…

Sound · Computer Science 2022-09-22 Ignacio J. Tripodi

Evaluating Deep Music Generation Methods Using Data Augmentation

Despite advances in deep algorithmic music generation, evaluation of generated samples often relies on human evaluation, which is subjective and costly. We focus on designing a homogeneous, objective framework for evaluating samples of…

Sound · Computer Science 2022-01-04 Toby Godwin , Georgios Rizos , Alice Baird , Najla D. Al Futaisi , Vincent Brisse , Bjoern W. Schuller

Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions

Automatic music generation is an interdisciplinary research topic that combines computational creativity and semantic analysis of music to create automatic machine improvisations. An important property of such a system is allowing the user…

Sound · Computer Science 2020-03-03 Ke Chen , Gus Xia , Shlomo Dubnov

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer…

Sound · Computer Science 2017-02-14 Soroush Mehri , Kundan Kumar , Ishaan Gulrajani , Rithesh Kumar , Shubham Jain , Jose Sotelo , Aaron Courville , Yoshua Bengio