English

Structure-informed Positional Encoding for Music Generation

Sound 2024-02-29 v2 Artificial Intelligence Audio and Speech Processing

Abstract

Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.

Keywords

Cite

@article{arxiv.2402.13301,
  title  = {Structure-informed Positional Encoding for Music Generation},
  author = {Manvi Agarwal and Changhong Wang and Gaël Richard},
  journal= {arXiv preprint arXiv:2402.13301},
  year   = {2024}
}