English
Related papers

Related papers: Progressive Multi-Scale Self-Supervised Learning f…

200 papers

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech…

Sound · Computer Science 2024-01-31 Jiatong Shi , Hirofumi Inaguma , Xutai Ma , Ilia Kulikov , Anna Sun

Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in…

Computation and Language · Computer Science 2024-04-30 Hongfei Xue , Qijie Shao , Kaixun Huang , Peikun Chen , Jie Liu , Lei Xie

Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR…

Sound · Computer Science 2022-05-05 Changfeng Gao , Gaofeng Cheng , Pengyuan Zhang

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical…

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-08 Yujin Wang , Changli Tang , Ziyang Ma , Zhisheng Zheng , Xie Chen , Wei-Qiang Zhang

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and…

Self-supervised learning (SSL) has transformed speech processing, yet its reliance on massive pre-training datasets remains a bottleneck. While robustness is often attributed to scale and diversity, the role of the data distribution is less…

Sound · Computer Science 2026-04-24 Ryan Whetten , Titouan Parcollet , Marco Dinarelli , Yannick Estève

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR). An ASR model with decent performance can be realized by fine-tuning an SSL model with…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-30 Zhisheng Zheng , Ziyang Ma , Yu Wang , Xie Chen

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea…

Computation and Language · Computer Science 2022-11-07 Florian L. Kreyssig , Yangyang Shi , Jinxi Guo , Leda Sari , Abdelrahman Mohamed , Philip C. Woodland

Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued…

Sound · Computer Science 2023-05-30 Xuankai Chang , Brian Yan , Yuya Fujita , Takashi Maekaku , Shinji Watanabe

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model,…

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have shown promising results in various downstream tasks in the speech community. In particular, speech representations learned by SSL models have been shown to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Eesung Kim , Jae-Jin Jeon , Hyeji Seo , Hoon Kim

Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-12 Theo Lepage , Reda Dehak

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Victor Miara , Theo Lepage , Reda Dehak
‹ Prev 1 2 3 10 Next ›