English
Related papers

Related papers: Tempo estimation as fully self-supervised binary c…

200 papers

Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language…

Sound · Computer Science 2023-06-27 Giovana Morais , Matthew E. P. Davies , Marcelo Queiroz , Magdalena Fuentes

Self-supervised methods have emerged as a promising avenue for representation learning in the recent years since they alleviate the need for labeled datasets, which are scarce and expensive to acquire. Contrastive methods are a popular…

Sound · Computer Science 2022-09-07 Elio Quinton

Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only…

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang

Music segmentation refers to the dual problem of identifying boundaries between, and labeling, distinct music segments, e.g., the chorus, verse, bridge etc. in popular music. The performance of a range of music segmentation algorithms has…

Sound · Computer Science 2021-08-31 Matthew C. McCallum

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Automatic music transcription converts audio recordings into symbolic representations, facilitating music analysis, retrieval, and generation. A musical note is characterized by pitch, onset, and offset in an audio domain, whereas it is…

Sound · Computer Science 2025-02-19 Leekyung Kim , Sungwook Jeon , Wan Heo , Jonghun Park

Recently, some single-step systems without onset detection have shown their effectiveness in automatic musical tempo estimation. Following the success of these systems, in this paper we propose a Multi-scale Grouped Attention Network to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-06 Xiaoheng Sun , Qiqi He , Yongwei Gao , Wei Li

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

A fitting soundtrack can help a video better convey its content and provide a better immersive experience. This paper introduces a novel approach utilizing self-supervised learning and contrastive learning to automatically recommend audio…

Multimedia · Computer Science 2025-03-10 Shimiao Liu , Alexander Lerch

In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from…

Many time series classification tasks, where labels vary over time, are affected by label noise that also varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and…

Machine Learning · Computer Science 2025-03-18 Sujay Nagaraj , Walter Gerych , Sana Tonekaboni , Anna Goldenberg , Berk Ustun , Thomas Hartvigsen

Audio classification has seen great progress with the increasing availability of large-scale datasets. These large datasets, however, are often only partially labeled as collecting full annotations is a tedious and expensive process. This…

Sound · Computer Science 2021-11-29 Siddharth Gururani , Alexander Lerch

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-27 Frank Cwitkowitz , Zhiyao Duan

Connecting large libraries of digitized audio recordings to their corresponding sheet music images has long been a motivation for researchers to develop new cross-modal retrieval systems. In recent years, retrieval systems based on…

Information Retrieval · Computer Science 2019-06-27 Stefan Balke , Matthias Dorfer , Luis Carvalho , Andreas Arzt , Gerhard Widmer

Quantitative analysis of commonalities and differences between recorded music performances is an increasingly common task in computational musicology. A typical scenario involves manual annotation of different recordings of the same piece…

Multimedia · Computer Science 2020-09-28 Thassilo Gadermaier , Gerhard Widmer

In the context of environmental sound classification, the adaptability of systems is key: which sound classes are interesting depends on the context and the user's needs. Recent advances in text-to-audio retrieval allow for zero-shot audio…

Sound · Computer Science 2023-08-21 Saksham Singh Kushwaha , Magdalena Fuentes

The increasing level of sound pollution in marine environments poses an increased threat to ocean health, making it crucial to monitor underwater noise. By monitoring this noise, the sources responsible for this pollution can be mapped.…

Sound · Computer Science 2025-05-20 Hilde I. Hummel , Arwin Gansekoele , Sandjai Bhulai , Rob van der Mei

This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-27 Luis Carvalho , Gerhard Widmer

We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. We acknowledge the fact that obtaining ground truth annotations at the required temporal and frequency resolution is a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-07 Beat Gfeller , Christian Frank , Dominik Roblek , Matt Sharifi , Marco Tagliasacchi , Mihajlo Velimirović
‹ Prev 1 2 3 10 Next ›