Related papers: CA-SSLR: Condition-Aware Self-Supervised Learning …

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Victor Miara , Theo Lepage , Reda Dehak

Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing

Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high…

Machine Learning · Computer Science 2025-01-22 Eklavya Sarkar , Mathew Magimai. -Doss

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and…

Sound · Computer Science 2025-01-07 Shih-Heng Wang , Zih-Ching Chen , Jiatong Shi , Ming-To Chuang , Guan-Ting Lin , Kuan-Po Huang , David Harwath , Shang-Wen Li , Hung-yi Lee

An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Samuel Kessler , Bethan Thomas , Salah Karout

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

Speech representation learning plays a vital role in speech processing. Among them, self-supervised learning (SSL) has become an important research direction. It has been shown that an SSL pretraining model can achieve excellent performance…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-20 Yi-Chen Chen , Shu-wen Yang , Cheng-Kuang Lee , Simon See , Hung-yi Lee

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that…

Computation and Language · Computer Science 2023-05-19 Ashish Seth , Lodagala V S V Durga Prasad , Sreyan Ghosh , S. Umesh

Contrastive Semi-supervised Learning for ASR

Pseudo-labeling is the most adopted method for pre-training automatic speech recognition (ASR) models. However, its performance suffers from the supervised teacher model's degrading quality in low-resource setups and under domain transfer.…

Computation and Language · Computer Science 2021-03-10 Alex Xiao , Christian Fuegen , Abdelrahman Mohamed

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128…

Computation and Language · Computer Science 2021-12-17 Arun Babu , Changhan Wang , Andros Tjandra , Kushal Lakhotia , Qiantong Xu , Naman Goyal , Kritika Singh , Patrick von Platen , Yatharth Saraf , Juan Pino , Alexei Baevski , Alexis Conneau , Michael Auli

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Layer-wise Analysis of a Self-supervised Speech Representation Model

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Label Aware Speech Representation Learning For Language Identification

Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using…

Computation and Language · Computer Science 2023-06-08 Shikhar Vashishth , Shikhar Bharadwaj , Sriram Ganapathy , Ankur Bapna , Min Ma , Wei Han , Vera Axelrod , Partha Talukdar

Large Language Model Guided Decoding for Self-Supervised Speech Recognition

Self-supervised automatic speech recognition (SSL-ASR) is an ASR approach that uses speech encoders pretrained on large amounts of unlabeled audio (e.g., wav2vec2.0 or HuBERT) and then fine-tunes them with limited labeled data to perform…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-07 Eyal Cohen , Bhiksha Raj , Joseph Keshet

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we…

Sound · Computer Science 2022-01-25 Zhengyang Chen , Sanyuan Chen , Yu Wu , Yao Qian , Chengyi Wang , Shujie Liu , Yanmin Qian , Michael Zeng

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur