English
Related papers

Related papers: CA-SSLR: Condition-Aware Self-Supervised Learning …

200 papers

Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is…

Computation and Language · Computer Science 2024-06-14 Amit Meghanani , Thomas Hain

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Victor Miara , Theo Lepage , Reda Dehak

Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high…

Machine Learning · Computer Science 2025-01-22 Eklavya Sarkar , Mathew Magimai. -Doss

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-30 Wangyou Zhang , Yanmin Qian

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and…

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Samuel Kessler , Bethan Thomas , Salah Karout

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

Speech representation learning plays a vital role in speech processing. Among them, self-supervised learning (SSL) has become an important research direction. It has been shown that an SSL pretraining model can achieve excellent performance…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-20 Yi-Chen Chen , Shu-wen Yang , Cheng-Kuang Lee , Simon See , Hung-yi Lee

Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Salah Zaiem , Robin Algayres , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that…

Computation and Language · Computer Science 2023-05-19 Ashish Seth , Lodagala V S V Durga Prasad , Sreyan Ghosh , S. Umesh

Pseudo-labeling is the most adopted method for pre-training automatic speech recognition (ASR) models. However, its performance suffers from the supervised teacher model's degrading quality in low-resource setups and under domain transfer.…

Computation and Language · Computer Science 2021-03-10 Alex Xiao , Christian Fuegen , Abdelrahman Mohamed

This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128…

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using…

Computation and Language · Computer Science 2023-06-08 Shikhar Vashishth , Shikhar Bharadwaj , Sriram Ganapathy , Ankur Bapna , Min Ma , Wei Han , Vera Axelrod , Partha Talukdar

Self-supervised automatic speech recognition (SSL-ASR) is an ASR approach that uses speech encoders pretrained on large amounts of unlabeled audio (e.g., wav2vec2.0 or HuBERT) and then fine-tunes them with limited labeled data to perform…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-07 Eyal Cohen , Bhiksha Raj , Joseph Keshet

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we…

Sound · Computer Science 2022-01-25 Zhengyang Chen , Sanyuan Chen , Yu Wu , Yao Qian , Chengyi Wang , Shujie Liu , Yanmin Qian , Michael Zeng

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Zili Huang , Desh Raj , Paola García , Sanjeev Khudanpur
‹ Prev 1 2 3 10 Next ›