Related papers: Tempo estimation as fully self-supervised binary c…

Tempo vs. Pitch: understanding self-supervised tempo estimation

Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language…

Sound · Computer Science 2023-06-27 Giovana Morais , Matthew E. P. Davies , Marcelo Queiroz , Magdalena Fuentes

Equivariant Self-Supervision for Musical Tempo Estimation

Self-supervised methods have emerged as a promising avenue for representation learning in the recent years since they alleviate the need for labeled datasets, which are scarce and expensive to acquire. Contrastive methods are a popular…

Sound · Computer Science 2022-09-07 Elio Quinton

Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only…

Sound · Computer Science 2024-01-18 Matthew C. McCallum , Florian Henkel , Jaehun Kim , Samuel E. Sandberg , Matthew E. P. Davies

Multi-Task Self-Supervised Pre-Training for Music Classification

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang

Unsupervised Learning of Deep Features for Music Segmentation

Music segmentation refers to the dual problem of identifying boundaries between, and labeling, distinct music segments, e.g., the chorus, verse, bridge etc. in popular music. The performance of a range of music segmentation algorithms has…

Sound · Computer Science 2021-08-31 Matthew C. McCallum

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Note-Level Singing Melody Transcription for Time-Aligned Musical Score Generation

Automatic music transcription converts audio recordings into symbolic representations, facilitating music analysis, retrieval, and generation. A musical note is characterized by pitch, onset, and offset in an audio domain, whereas it is…

Sound · Computer Science 2025-02-19 Leekyung Kim , Sungwook Jeon , Wan Heo , Jonghun Park

Musical Tempo Estimation Using a Multi-scale Network

Recently, some single-step systems without onset detection have shown their effectiveness in automatic musical tempo estimation. Following the success of these systems, in this paper we propose a Multi-scale Grouped Attention Network to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-06 Xiaoheng Sun , Qiqi He , Yongwei Gao , Wei Li

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

Sound · Computer Science 2022-10-11 Matthew C. McCallum , Filip Korzeniowski , Sergio Oramas , Fabien Gouyon , Andreas F. Ehmann

Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings

A fitting soundtrack can help a video better convey its content and provide a better immersive experience. This paper introduces a novel approach utilizing self-supervised learning and contrastive learning to automatically recommend audio…

Multimedia · Computer Science 2025-03-10 Shimiao Liu , Alexander Lerch

Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from…

Sound · Computer Science 2023-04-18 Taketo Akama , Hiroaki Kitano , Katsuhiro Takematsu , Yasushi Miyajima , Natalia Polouliakh

Learning under Temporal Label Noise

Many time series classification tasks, where labels vary over time, are affected by label noise that also varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and…

Machine Learning · Computer Science 2025-03-18 Sujay Nagaraj , Walter Gerych , Sana Tonekaboni , Anna Goldenberg , Berk Ustun , Thomas Hartvigsen

Semi-Supervised Audio Classification with Partially Labeled Data

Audio classification has seen great progress with the increasing availability of large-scale datasets. These large datasets, however, are often only partially labeled as collecting full annotations is a tedious and expensive process. This…

Sound · Computer Science 2021-11-29 Siddharth Gururani , Alexander Lerch

Toward Fully Self-Supervised Multi-Pitch Estimation

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-27 Frank Cwitkowitz , Zhiyao Duan

Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Connecting large libraries of digitized audio recordings to their corresponding sheet music images has long been a motivation for researchers to develop new cross-modal retrieval systems. In recent years, retrieval systems based on…

Information Retrieval · Computer Science 2019-06-27 Stefan Balke , Matthias Dorfer , Luis Carvalho , Andreas Arzt , Gerhard Widmer

A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral Music

Quantitative analysis of commonalities and differences between recorded music performances is an increasingly common task in computational musicology. A typical scenario involves manual annotation of different recordings of the same piece…

Multimedia · Computer Science 2020-09-28 Thassilo Gadermaier , Gerhard Widmer

A Multimodal Prototypical Approach for Unsupervised Sound Classification

In the context of environmental sound classification, the adaptability of systems is key: which sound classes are interesting depends on the context and the user's needs. Recent advances in text-to-audio retrieval allow for zero-shot audio…

Sound · Computer Science 2023-08-21 Saksham Singh Kushwaha , Magdalena Fuentes

The Computation of Generalized Embeddings for Underwater Acoustic Target Recognition using Contrastive Learning

The increasing level of sound pollution in marine environments poses an increased threat to ocean health, making it crucial to monitor underwater noise. By monitoring this noise, the sources responsible for this pollution can be mapped.…

Sound · Computer Science 2025-05-20 Hilde I. Hummel , Arwin Gansekoele , Sandjai Bhulai , Rob van der Mei

Exploiting Temporal Dependencies for Cross-Modal Music Piece Identification

This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-27 Luis Carvalho , Gerhard Widmer

SPICE: Self-supervised Pitch Estimation

We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. We acknowledge the fact that obtaining ground truth annotations at the required temporal and frequency resolution is a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-07 Beat Gfeller , Christian Frank , Dominik Roblek , Matt Sharifi , Marco Tagliasacchi , Mihajlo Velimirović