Related papers: Progressive Multi-Scale Self-Supervised Learning f…

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech…

Sound · Computer Science 2024-01-31 Jiatong Shi , Hirofumi Inaguma , Xutai Ma , Ilia Kulikov , Anna Sun

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in…

Computation and Language · Computer Science 2024-04-30 Hongfei Xue , Qijie Shao , Kaixun Huang , Peikun Chen , Jie Liu , Lei Xie

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR…

Sound · Computer Science 2022-05-05 Changfeng Gao , Gaofeng Cheng , Pengyuan Zhang

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Realizing Petabyte Scale Acoustic Modeling

Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical…

Sound · Computer Science 2019-04-25 Sree Hari Krishnan Parthasarathi , Nitin Sivakrishnan , Pranav Ladkat , Nikko Strom

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-08 Yujin Wang , Changli Tang , Ziyang Ma , Zhisheng Zheng , Xie Chen , Wei-Qiang Zhang

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and…

Sound · Computer Science 2025-01-07 Shih-Heng Wang , Zih-Ching Chen , Jiatong Shi , Ming-To Chuang , Guan-Ting Lin , Kuan-Po Huang , David Harwath , Shang-Wen Li , Hung-yi Lee

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Self-supervised learning (SSL) has transformed speech processing, yet its reliance on massive pre-training datasets remains a bottleneck. While robustness is often attributed to scale and diversity, the role of the data distribution is less…

Sound · Computer Science 2026-04-24 Ryan Whetten , Titouan Parcollet , Marco Dinarelli , Yannick Estève

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR). An ASR model with decent performance can be realized by fine-tuning an SSL model with…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-30 Zhisheng Zheng , Ziyang Ma , Yu Wang , Xie Chen

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur

Biased Self-supervised learning for ASR

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea…

Computation and Language · Computer Science 2022-11-07 Florian L. Kreyssig , Yangyang Shi , Jinxi Guo , Leda Sari , Abdelrahman Mohamed , Philip C. Woodland

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued…

Sound · Computer Science 2023-05-30 Xuankai Chang , Brian Yan , Yuya Fujita , Takashi Maekaku , Shinji Watanabe

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model,…

Sound · Computer Science 2024-06-14 Jiatong Shi , Shih-Heng Wang , William Chen , Martijn Bartelds , Vanya Bannihatti Kumar , Jinchuan Tian , Xuankai Chang , Dan Jurafsky , Karen Livescu , Hung-yi Lee , Shinji Watanabe

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have shown promising results in various downstream tasks in the speech community. In particular, speech representations learned by SSL models have been shown to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Eesung Kim , Jae-Jin Jeon , Hyeji Seo , Hoon Kim

Self-Supervised Learning for Speaker Recognition: A study and review

Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-12 Theo Lepage , Reda Dehak

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Victor Miara , Theo Lepage , Reda Dehak