English
Related papers

Related papers: Multi-Task Self-Supervised Pre-Training for Music …

200 papers

Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data. It is particularly compelling in the music domain, where obtaining labeled data is time-consuming,…

Sound · Computer Science 2024-04-16 Gabriel Meseguer-Brocal , Dorian Desblancs , Romain Hennequin

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal…

While deep learning has been incredibly successful in modeling tasks with large, carefully curated labeled datasets, its application to problems with limited labeled data remains a challenge. The aim of the present work is to improve the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-29 Tyler Lee , Ting Gong , Suchismita Padhy , Andrew Rouditchenko , Anthony Ndirango

Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-23 Salah Zaiem , Titouan Parcollet , Slim Essid , Abdel Heba

In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from…

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio…

Sound · Computer Science 2023-07-25 Peranut Nimitsurachat , Peter Washington

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Audio and speech self-supervised encoder models are now widely used for a lot of different tasks. Many of these models are often trained on clean segmented speech content such as LibriSpeech. In this paper, we look into how the pretraining…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-13 Valentin Pelloin , Lina Bekkali , Reda Dehak , David Doukhan

Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality…

Sound · Computer Science 2024-09-10 Bing Yang , Xiaofei Li

Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of…

Sound · Computer Science 2025-05-16 Jingyong Liang , Bernd Meyer , Isaac Ning Lee , Thanh-Toan Do

Self-supervised pretraining has been observed to be effective at improving feature representations for transfer learning, leveraging large amounts of unlabelled data. This review summarizes recent research into its usage in X-ray, computed…

Machine Learning · Computer Science 2023-09-07 Blake VanBerlo , Jesse Hoey , Alexander Wong

We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted…

Sound · Computer Science 2021-11-29 Minz Won , Keunwoo Choi , Xavier Serra

In this work we introduce a self-supervised pretraining framework for transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we pretrain our architecture on two self-supervised tasks simultaneously to teach the model a…

Machine Learning · Computer Science 2023-05-17 Sean Paulsen , Michael Casey

We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space,…

Machine Learning · Computer Science 2023-11-10 Tomoharu Iwata , Atsutoshi Kumagai

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and…

Computation and Language · Computer Science 2018-04-10 Isabelle Augenstein , Sebastian Ruder , Anders Søgaard

Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the…

Sound · Computer Science 2020-03-24 Siddique Latif , Rajib Rana , Sara Khalifa , Raja Jurdak , Julien Epps , Björn W. Schuller

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to…

Sound · Computer Science 2021-05-05 Andrew N Carr , Quentin Berthet , Mathieu Blondel , Olivier Teboul , Neil Zeghidour

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve…

Sound · Computer Science 2022-01-02 Yi Li , Yang Sun , Syed Mohsen Naqvi

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for…

Machine Learning · Computer Science 2018-07-12 Veronica Morfi , Dan Stowell
‹ Prev 1 2 3 10 Next ›