Related papers: DExter: Learning and Controlling Performance Expre…

DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models

Understanding and explaining the behavior of machine learning models is essential for building transparent and trustworthy AI systems. We introduce DEXTER, a data-free framework that employs diffusion models and large language models to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Simone Carnemolla , Matteo Pennisi , Sarinda Samarasinghe , Giovanni Bellitto , Simone Palazzo , Daniela Giordano , Mubarak Shah , Concetto Spampinato

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

We present PANDORA, a novel diffusion-based policy learning framework designed specifically for dexterous robotic piano performance. Our approach employs a conditional U-Net architecture enhanced with FiLM-based global conditioning, which…

Machine Learning · Computer Science 2025-03-20 Yanjia Huang , Renjie Li , Zhengzhong Tu

RenderBox: Expressive Performance Rendering with Text Control

Expressive music performance rendering involves interpreting symbolic scores with variations in timing, dynamics, articulation, and instrument-specific techniques, resulting in performances that capture musical can emotional intent. We…

Audio and Speech Processing · Electrical Eng. & Systems 2025-02-12 Huan Zhang , Akira Maezawa , Simon Dixon

Brain-Driven Representation Learning Based on Diffusion Model

Interpreting EEG signals linked to spoken language presents a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. Denoising diffusion probabilistic models (DDPMs), which have…

Computation and Language · Computer Science 2023-11-15 Soowon Kim , Seo-Hyun Lee , Young-Eun Lee , Ji-Won Lee , Ji-Ha Park , Seong-Whan Lee

Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis

Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in…

Sound · Computer Science 2023-09-22 Ben Maman , Johannes Zeitler , Meinard Müller , Amit H. Bermano

Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP

We present a framework for real-time human-AI musical co-performance, in which a latent diffusion model generates instrumental accompaniment in response to a live stream of context audio. The system combines a MAX/MSP front-end-handling…

Sound · Computer Science 2026-04-10 Tornike Karchkhadze , Shlomo Dubnov

Neural Network Diffusion

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an…

Machine Learning · Computer Science 2025-01-03 Kai Wang , Dongwen Tang , Boya Zeng , Yida Yin , Zhaopan Xu , Yukun Zhou , Zelin Zang , Trevor Darrell , Zhuang Liu , Yang You

Expressive Music Data Processing and Generation

Musical expressivity and coherence are indispensable in music composition and performance, while often neglected in modern AI generative models. In this work, we introduce a listening-based data-processing technique that captures the…

Sound · Computer Science 2025-03-18 Jingwei Liu

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and…

Sound · Computer Science 2023-07-21 Lejun Min , Junyan Jiang , Gus Xia , Jingwei Zhao

Controllable Music Production with Diffusion Models and Guidance Gradients

We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include…

Sound · Computer Science 2023-12-06 Mark Levy , Bruno Di Giorgi , Floris Weers , Angelos Katharopoulos , Tom Nickson

Deep Generative Models of Music Expectation

A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation…

Sound · Computer Science 2023-10-06 Ninon Lizé Masclef , T. Anderson Keller

Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

Recent advances in latent diffusion models have demonstrated state-of-the-art performance in high-dimensional time-series data synthesis while providing flexible control through conditioning and guidance. However, existing methodologies…

Machine Learning · Computer Science 2025-11-11 Matteo Pettenó , Alessandro Ilic Mezza , Alberto Bernardini

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing…

Machine Learning · Computer Science 2023-05-17 Simon Alexanderson , Rajmund Nagy , Jonas Beskow , Gustav Eje Henter

On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game

A piece of music can be expressively performed, or interpreted, in a variety of ways. With the help of an online questionnaire, the Con Espressione Game, we collected some 1,500 descriptions of expressive character relating to 45…

Sound · Computer Science 2020-08-06 Carlos Cancino-Chacón , Silvan Peter , Shreyan Chowdhury , Anna Aljanaki , Gerhard Widmer

Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning

We propose a system for rendering a symbolic piano performance with flexible musical expression. It is necessary to actively control musical expression for creating a new music performance that conveys various emotions or nuances. However,…

Sound · Computer Science 2022-09-07 Seungyeon Rhyu , Sarah Kim , Kyogu Lee

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-21 Yibo Wang , Ruiyuan Gao , Kai Chen , Kaiqiang Zhou , Yingjie Cai , Lanqing Hong , Zhenguo Li , Lihui Jiang , Dit-Yan Yeung , Qiang Xu , Kai Zhang

Combining audio control and style transfer using latent diffusion

Deep generative models are now able to synthesize high-quality audio signals, shifting the critical aspect in their development from audio quality to control capabilities. Although text-to-music generation is getting largely adopted by the…

Sound · Computer Science 2024-08-02 Nils Demerlé , Philippe Esling , Guillaume Doras , David Genova

Basis-Function Modeling of Loudness Variations in Ensemble Performance

This paper describes a computational model of loudness variations in expressive ensemble performance. The model predicts and explains the continuous variation of loudness as a function of information extracted automatically from the written…

Sound · Computer Science 2016-12-19 Thassilo Gadermaier , Maarten Grachten , Carlos Eduardo Cancino Chacón

Diffusion-based Signal Refiner for Speech Enhancement and Separation

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Masato Hirano , Ryosuke Sawata , Naoki Murata , Shusuke Takahashi , Yuki Mitsufuji