Structure-informed Positional Encoding for Music Generation
Abstract
Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
Cite
@article{arxiv.2402.13301,
title = {Structure-informed Positional Encoding for Music Generation},
author = {Manvi Agarwal and Changhong Wang and Gaël Richard},
journal= {arXiv preprint arXiv:2402.13301},
year = {2024}
}