Related papers: Audio Self-supervised Learning: A Survey

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised…

Machine Learning · Computer Science 2024-07-16 Jie Gui , Tuo Chen , Jing Zhang , Qiong Cao , Zhenan Sun , Hao Luo , Dacheng Tao

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming…

Sound · Computer Science 2022-05-19 Mostafa Karimi , Changliang Liu , Kenichi Kumatani , Yao Qian , Tianyu Wu , Jian Wu

Self-Supervised Multi-View Learning for Disentangled Music Audio Representations

Self-supervised learning (SSL) offers a powerful way to learn robust, generalizable representations without labeled data. In music, where labeled data is scarce, existing SSL methods typically use generated supervision and multi-view…

Sound · Computer Science 2024-11-06 Julia Wilkins , Sivan Ding , Magdalena Fuentes , Juan Pablo Bello

Self-Supervised Learning for Few-Shot Bird Sound Classification

Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is pertinent in bioacoustics, where biologists…

Sound · Computer Science 2024-02-12 Ilyass Moummad , Romain Serizel , Nicolas Farrugia

Self-Supervised Learning for Audio-Based Emotion Recognition

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio…

Sound · Computer Science 2023-07-25 Peranut Nimitsurachat , Peter Washington

Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However,…

Sound · Computer Science 2024-02-07 Yasar Abbas Ur Rehman , Kin Wai Lau , Yuyang Xie , Lan Ma , Jiajun Shen

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing…

Computation and Language · Computer Science 2024-08-28 Takanori Ashihara , Takafumi Moriya , Kohei Matsuura , Tomohiro Tanaka , Yusuke Ijima , Taichi Asami , Marc Delcroix , Yukinori Honma

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train…

Sound · Computer Science 2022-11-23 Yan Gao , Javier Fernandez-Marques , Titouan Parcollet , Pedro P. B. de Gusmao , Nicholas D. Lane

Self-supervised learning for autonomous vehicles perception: A conciliation between analytical and learning methods

Nowadays, supervised deep learning techniques yield the best state-of-the-art prediction performances for a wide variety of computer vision tasks. However, such supervised techniques generally require a large amount of manually labeled…

Computer Vision and Pattern Recognition · Computer Science 2020-06-09 Florent Chiaroni , Mohamed-Cherif Rahal , Nicolas Hueber , Frederic Dufaux

Self-supervised models of audio effectively explain human cortical responses to speech

Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either…

Computation and Language · Computer Science 2022-05-31 Aditya R. Vaidya , Shailee Jain , Alexander G. Huth

Self-Supervised Learning for Speaker Recognition: A study and review

Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-12 Theo Lepage , Reda Dehak

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work…

Sound · Computer Science 2023-09-06 Yuya Yamamoto

A Pre-training Framework that Encodes Noise Information for Speech Quality Assessment

Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-08 Subrina Sultana , Donald S. Williamson

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to…

Computation and Language · Computer Science 2022-11-01 Tzu-hsun Feng , Annie Dong , Ching-Feng Yeh , Shu-wen Yang , Tzu-Quan Lin , Jiatong Shi , Kai-Wei Chang , Zili Huang , Haibin Wu , Xuankai Chang , Shinji Watanabe , Abdelrahman Mohamed , Shang-Wen Li , Hung-yi Lee

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of…

Computation and Language · Computer Science 2022-06-28 Sanyuan Chen , Yu Wu , Chengyi Wang , Shujie Liu , Zhuo Chen , Peidong Wang , Gang Liu , Jinyu Li , Jian Wu , Xiangzhan Yu , Furu Wei

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-11 Zili Huang , Zhuo Chen , Naoyuki Kanda , Jian Wu , Yiming Wang , Jinyu Li , Takuya Yoshioka , Xiaofei Wang , Peidong Wang

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in…

Computation and Language · Computer Science 2021-10-13 Sanyuan Chen , Yu Wu , Chengyi Wang , Zhengyang Chen , Zhuo Chen , Shujie Liu , Jian Wu , Yao Qian , Furu Wei , Jinyu Li , Xiangzhan Yu

Self-Supervised Learning for Image Segmentation: A Comprehensive Survey

Supervised learning demands large amounts of precisely annotated data to achieve promising results. Such data curation is labor-intensive and imposes significant overhead regarding time and costs. Self-supervised learning (SSL) partially…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Thangarajah Akilan , Nusrat Jahan , Wandong Zhang

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Self-supervised pre-trained audio networks have seen widespread adoption in real-world systems, particularly in multi-modal large language models. These networks are often employed in a frozen state, under the assumption that the SSL…

Sound · Computer Science 2025-06-17 Tony Alex , Sara Ahmed , Armin Mustafa , Muhammad Awais , Philip JB Jackson

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe