English
Related papers

Related papers: Audio Self-supervised Learning: A Survey

200 papers

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised…

Machine Learning · Computer Science 2024-07-16 Jie Gui , Tuo Chen , Jing Zhang , Qiong Cao , Zhenan Sun , Hao Luo , Dacheng Tao

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming…

Sound · Computer Science 2022-05-19 Mostafa Karimi , Changliang Liu , Kenichi Kumatani , Yao Qian , Tianyu Wu , Jian Wu

Self-supervised learning (SSL) offers a powerful way to learn robust, generalizable representations without labeled data. In music, where labeled data is scarce, existing SSL methods typically use generated supervision and multi-view…

Sound · Computer Science 2024-11-06 Julia Wilkins , Sivan Ding , Magdalena Fuentes , Juan Pablo Bello

Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is pertinent in bioacoustics, where biologists…

Sound · Computer Science 2024-02-12 Ilyass Moummad , Romain Serizel , Nicolas Farrugia

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio…

Sound · Computer Science 2023-07-25 Peranut Nimitsurachat , Peter Washington

The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However,…

Sound · Computer Science 2024-02-07 Yasar Abbas Ur Rehman , Kin Wai Lau , Yuyang Xie , Lan Ma , Jiajun Shen

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing…

Computation and Language · Computer Science 2024-08-28 Takanori Ashihara , Takafumi Moriya , Kohei Matsuura , Tomohiro Tanaka , Yusuke Ijima , Taichi Asami , Marc Delcroix , Yukinori Honma

Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train…

Nowadays, supervised deep learning techniques yield the best state-of-the-art prediction performances for a wide variety of computer vision tasks. However, such supervised techniques generally require a large amount of manually labeled…

Computer Vision and Pattern Recognition · Computer Science 2020-06-09 Florent Chiaroni , Mohamed-Cherif Rahal , Nicolas Hueber , Frederic Dufaux

Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either…

Computation and Language · Computer Science 2022-05-31 Aditya R. Vaidya , Shailee Jain , Alexander G. Huth

Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-12 Theo Lepage , Reda Dehak

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to…

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of…

Computation and Language · Computer Science 2022-06-28 Sanyuan Chen , Yu Wu , Chengyi Wang , Shujie Liu , Zhuo Chen , Peidong Wang , Gang Liu , Jinyu Li , Jian Wu , Xiangzhan Yu , Furu Wei

Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-11 Zili Huang , Zhuo Chen , Naoyuki Kanda , Jian Wu , Yiming Wang , Jinyu Li , Takuya Yoshioka , Xiaofei Wang , Peidong Wang

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in…

Computation and Language · Computer Science 2021-10-13 Sanyuan Chen , Yu Wu , Chengyi Wang , Zhengyang Chen , Zhuo Chen , Shujie Liu , Jian Wu , Yao Qian , Furu Wei , Jinyu Li , Xiangzhan Yu

Supervised learning demands large amounts of precisely annotated data to achieve promising results. Such data curation is labor-intensive and imposes significant overhead regarding time and costs. Self-supervised learning (SSL) partially…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Thangarajah Akilan , Nusrat Jahan , Wandong Zhang

Self-supervised pre-trained audio networks have seen widespread adoption in real-world systems, particularly in multi-modal large language models. These networks are often employed in a frozen state, under the assumption that the SSL…

Sound · Computer Science 2025-06-17 Tony Alex , Sara Ahmed , Armin Mustafa , Muhammad Awais , Philip JB Jackson

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

‹ Prev 1 2 3 10 Next ›