Related papers: MIDI-LAB, a Powerful Visual Basic Program for Crea…

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation

We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By…

Sound · Computer Science 2025-11-07 Shih-Lun Wu , Yoon Kim , Cheng-Zhi Anna Huang

MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music

Symbolic music research plays a crucial role in music-related machine learning, but MIDI data can be complex for those without musical expertise. To address this issue, we present MidiTok Visualizer, a web application designed to facilitate…

Sound · Computer Science 2024-10-29 Michał Wiszenko , Kacper Stefański , Piotr Malesa , Łukasz Pokorzyński , Mateusz Modrzejewski

MidiCaps: A large-scale MIDI dataset with text captions

Generative models guided by text prompts are increasingly becoming more popular. However, no text-to-MIDI models currently exist due to the lack of a captioned MIDI dataset. This work aims to enable research that combines LLMs with symbolic…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-08 Jan Melechovsky , Abhinaba Roy , Dorien Herremans

Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text,…

Sound · Computer Science 2026-05-15 Shuyang Cui , Zhi Zhong , Qiyu Wu , Zachary Novack , Woosung Choi , Keisuke Toyama , Kin Wai Cheuk , Junghyun Koo , Yukara Ikemiya , Christian Simon , Chihiro Nagashima , Shusuke Takahashi

Foley Music: Learning to Generate Music from Videos

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Chuang Gan , Deng Huang , Peihao Chen , Joshua B. Tenenbaum , Antonio Torralba

MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding

Recent advances in multimodal large language models (MLLM) for audio music have demonstrated strong capabilities in music understanding, yet symbolic music, a fundamental representation of musical structure, remains unexplored. In this…

Multimedia · Computer Science 2026-01-30 Meng Yang , Jon McCormack , Maria Teresa Llano , Wanchao Su , Chao Lei

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Multimodal music generation aims to produce music from diverse input modalities, including text, videos, and images. Existing methods use a common embedding space for multimodal fusion. Despite their effectiveness in other modalities, their…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Baisen Wang , Le Zhuo , Zhaokai Wang , Chenxi Bao , Wu Chengjing , Xuecheng Nie , Jiao Dai , Jizhong Han , Yue Liao , Si Liu

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and…

Sound · Computer Science 2022-03-21 Yusong Wu , Ethan Manilow , Yi Deng , Rigel Swavely , Kyle Kastner , Tim Cooijmans , Aaron Courville , Cheng-Zhi Anna Huang , Jesse Engel

Supporting Music Education through Visualizations of MIDI Recordings

Musicians mostly have to rely on their ears when they want to analyze what they play, for example to detect errors. Since hearing is sequential, it is not possible to quickly grasp an overview over one or multiple recordings of a whole…

Human-Computer Interaction · Computer Science 2026-03-26 Frank Heyen , Michael Sedlmair

MV-Crafter: An Intelligent System for Music-guided Video Generation

Music videos, as a prevalent form of multimedia entertainment, deliver engaging audio-visual experiences to audiences and have gained immense popularity among singers and fans. Creators can express their interpretations of music naturally…

Human-Computer Interaction · Computer Science 2025-04-25 Chuer Chen , Shengqi Dang , Yuqi Liu , Nanxuan Zhao , Yang Shi , Nan Cao

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large…

Human-Computer Interaction · Computer Science 2024-02-22 Stephen Brade , Bryan Wang , Mauricio Sousa , Gregory Lee Newsome , Sageev Oore , Tovi Grossman

MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

While end-to-end lyrics-to-song models offer convenience for casual users, professional songwriters require score-to-song systems that allow them to retain authorship over the core melody. However, existing score-to-song methods are limited…

Sound · Computer Science 2026-05-06 Fang-Duo Tsai , Yi-An Lai , Fei-Yueh Chen , Hsueh-Wei Fu , Wei-Jaw Lee , Hao-Chung Cheng , Yi-Hsuan Yang

Deep Learning Techniques for Music Generation -- A Survey

This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical…

Sound · Computer Science 2019-08-09 Jean-Pierre Briot , Gaëtan Hadjeres , François-David Pachet

Midi Miner -- A Python library for tonal tension and track classification

We present a Python library, called Midi Miner, that can calculate tonal tension and classify different tracks. MIDI (Music Instrument Digital Interface) is a hardware and software standard for communicating musical events between digital…

Sound · Computer Science 2020-05-27 Rui Guo , Dorien Herremans , Thor Magnusson

Generative Disco: Text-to-Video Generation for Music Visualization

Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce…

Human-Computer Interaction · Computer Science 2023-09-29 Vivian Liu , Tao Long , Nathan Raw , Lydia Chilton

Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches. This research develops a model capable of generating music that…

Sound · Computer Science 2024-09-13 Tanisha Hisariya , Huan Zhang , Jinhua Liang

Local deployment of large-scale music AI models on commodity hardware

We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large…

Sound · Computer Science 2024-11-15 Xun Zhou , Charlie Ruan , Zihe Zhao , Tianqi Chen , Chris Donahue

Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music

Generating music is an interesting and challenging problem in the field of machine learning. Mimicking human creativity has been popular in recent years, especially in the field of computer vision and image processing. With the advent of…

Sound · Computer Science 2020-11-03 Ashish Ranjan , Varun Nagesh Jolly Behera , Motahar Reza

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance

Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of…

Sound · Computer Science 2024-03-20 Victor Shepardson , Jack Armitage , Thor Magnusson

Calliope: An Online Generative Music System for Symbolic Multi-Track Composition

With the rise of artificial intelligence in recent years, there has been a rapid increase in its application towards creative domains, including music. There exist many systems built that apply machine learning approaches to the problem of…

Human-Computer Interaction · Computer Science 2025-04-22 Renaud Bougueng Tchemeube , Jeff Ens , Philippe Pasquier