Related papers: Detecting Music Performance Errors with Transforme…

LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection

Music learners can greatly benefit from tools that accurately detect errors in their practice. Existing approaches typically compare audio recordings to music scores using heuristics or learnable models. This paper introduces LadderSym, a…

Sound · Computer Science 2026-03-05 Benjamin Shiue-Hal Chou , Purvish Jajal , Nick John Eliopoulos , James C. Davis , George K. Thiruvathukal , Kristen Yeon-Ji Yun , Yung-Hsiang Lu

Deep Performer: Score-to-Audio Music Performance Synthesis

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music…

Sound · Computer Science 2022-02-22 Hao-Wen Dong , Cong Zhou , Taylor Berg-Kirkpatrick , Julian McAuley

Real-time error correction and performance aid for MIDI instruments

Making a slight mistake during live music performance can easily be spotted by an astute listener, even if the performance is an improvisation or an unfamiliar piece. An example might be a highly dissonant chord played by mistake in a…

Sound · Computer Science 2020-12-01 Georgi Marinov

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-08 Jiuyang Zhou , Hong Zhu , Xingping Wang

Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy

The advancement of machine learning in audio analysis has opened new possibilities for technology-enhanced music education. This paper introduces a framework for automatic singing mistake detection in the context of music pedagogy,…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-09 Sumit Kumar , Suraj Jaiswal , Parampreet Singh , Vipul Arora

Continuous Learning of Transformer-based Audio Deepfake Detection

This paper proposes a novel framework for audio deepfake detection with two main objectives: i) attaining the highest possible accuracy on available fake data, and ii) effectively performing continuous learning on new fake data in a…

Sound · Computer Science 2024-09-11 Tuan Duy Nguyen Le , Kah Kuan Teh , Huy Dat Tran

RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection

This study introduces RUMAA, a transformer-based framework for music performance analysis that unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner. Unlike prior methods…

Sound · Computer Science 2025-07-17 Sungkyun Chang , Simon Dixon , Emmanouil Benetos

Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model

This study aims to enhance the quality of music generation using Transformers by incorporating meta-information. While Transformer-based approaches are effective at capturing long-term dependencies in musical compositions, the music they…

Sound · Computer Science 2026-05-21 Shinnosuke Taksuka , Hideo Mukai

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument. This task is challenging for modeling as it requires simultaneously identifying multiple instruments and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-02 Sungkyun Chang , Emmanouil Benetos , Holger Kirchhoff , Simon Dixon

Deep convolutional neural networks for predominant instrument recognition in polyphonic music

Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music…

Sound · Computer Science 2016-12-28 Yoonchang Han , Jaehun Kim , Kyogu Lee

Multi-Genre Music Transformer -- Composing Full Length Musical Piece

In the task of generating music, the art factor plays a big role and is a great challenge for AI. Previous work involving adversarial training to produce new music pieces and modeling the compatibility of variety in music (beats, tempo,…

Sound · Computer Science 2023-01-09 Abhinav Kaushal Keshari

Evaluating Fake Music Detection Performance Under Audio Augmentations

With the rapid advancement of generative audio models, distinguishing between human-composed and generated music is becoming increasingly challenging. As a response, models for detecting fake music have been proposed. In this work, we…

Sound · Computer Science 2025-07-15 Tomasz Sroka , Tomasz Wężowicz , Dominik Sidorczuk , Mateusz Modrzejewski

Multitrack Music Transformer

Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference. This is partly due to the memory requirements of the…

Sound · Computer Science 2023-05-26 Hao-Wen Dong , Ke Chen , Shlomo Dubnov , Julian McAuley , Taylor Berg-Kirkpatrick

End-to-end Piano Performance-MIDI to Score Conversion with Transformers

The automated creation of accurate musical notation from an expressive human performance is a fundamental task in computational musicology. To this end, we present an end-to-end deep learning approach that constructs detailed musical scores…

Sound · Computer Science 2024-10-02 Tim Beyer , Angela Dai

Detecting Notational Errors in Digital Music Scores

Music scores are used to precisely store music pieces for transmission and preservation. To represent and manipulate these complex objects, various formats have been tailored for different use cases. While music notation follows specific…

Multimedia · Computer Science 2025-10-06 Géré Léo , Nicolas Audebert , Florent Jacquemard

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we…

Sound · Computer Science 2023-10-17 Dichucheng Li , Yinghao Ma , Weixing Wei , Qiuqiang Kong , Yulun Wu , Mingjin Che , Fan Xia , Emmanouil Benetos , Wei Li

Pivotuner: automatic real-time pure intonation and microtonal modulation

Pivotuner is a VST3/AU MIDI effect plugin that automatically tunes note data in an adaptive pure intonation, in real time. Where previously pure intonation was out of reach for most musicians due to difficulty and impracticality, Pivotuner…

Multimedia · Computer Science 2023-06-07 Dmitri Volkov

Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

The ''pretraining-and-finetuning'' paradigm has become a norm for training domain-specific models in natural language processing and computer vision. In this work, we aim to examine this paradigm for symbolic music generation through…

Sound · Computer Science 2023-11-22 Weihan Xu , Julian McAuley , Shlomo Dubnov , Hao-Wen Dong

Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol

Recent advances in song identification leverage deep neural networks to learn compact audio fingerprints directly from raw waveforms. While these methods perform well under controlled conditions, their accuracy drops significantly in…

Sound · Computer Science 2025-09-16 Christos Nikou , Theodoros Giannakopoulos

Towards an efficient deep learning model for musical onset detection

In this paper, we propose an efficient and reproducible deep learning model for musical onset detection (MOD). We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack…

Sound · Computer Science 2018-06-20 Rong Gong , Xavier Serra