English
Related papers

Related papers: Joint Encoder-Decoder Self-Supervised Pre-training…

200 papers

In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data. However, compared with tasks such as speech recognition (ASR),…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-14 Tianrui Wang , Xie Chen , Zhuo Chen , Shu Yu , Weibin Zhu

Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some…

Computation and Language · Computer Science 2024-04-22 Darshan Prabhu , Sai Ganesh Mirishkar , Pankaj Wasnik

Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech…

Sound · Computer Science 2024-01-31 Jiatong Shi , Hirofumi Inaguma , Xutai Ma , Ilia Kulikov , Anna Sun

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in…

Computation and Language · Computer Science 2021-10-13 Sanyuan Chen , Yu Wu , Chengyi Wang , Zhengyang Chen , Zhuo Chen , Shujie Liu , Jian Wu , Yao Qian , Furu Wei , Jinyu Li , Xiangzhan Yu

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-08 Yujin Wang , Changli Tang , Ziyang Ma , Zhisheng Zheng , Xie Chen , Wei-Qiang Zhang

This paper proposes a novel technique to obtain better downstream ASR performance from a joint encoder-decoder self-supervised model when trained with speech pooled from two different channels (narrow and wide band). The joint…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Vrunda N. Sukhadia , A. Arunkumar , S. Umesh

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since…

Sound · Computer Science 2023-06-26 Jiatong Shi , Yun Tang , Hirofumi Inaguma , Hongyu GOng , Juan Pino , Shinji Watanabe

Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued…

Sound · Computer Science 2023-05-30 Xuankai Chang , Brian Yan , Yuya Fujita , Takashi Maekaku , Shinji Watanabe

Self-supervised learning (SSL) has led to great strides in speech processing. However, the resources needed to train these models has become prohibitively large as they continue to scale. Currently, only a few groups with substantial…

Computation and Language · Computer Science 2023-06-13 William Chen , Xuankai Chang , Yifan Peng , Zhaoheng Ni , Soumi Maiti , Shinji Watanabe

Self-supervised automatic speech recognition (SSL-ASR) is an ASR approach that uses speech encoders pretrained on large amounts of unlabeled audio (e.g., wav2vec2.0 or HuBERT) and then fine-tunes them with limited labeled data to perform…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-07 Eyal Cohen , Bhiksha Raj , Joseph Keshet

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase,…

Computation and Language · Computer Science 2021-06-15 Wei-Ning Hsu , Benjamin Bolte , Yao-Hung Hubert Tsai , Kushal Lakhotia , Ruslan Salakhutdinov , Abdelrahman Mohamed

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to…

Computation and Language · Computer Science 2022-06-22 Chengyi Wang , Yiming Wang , Yu Wu , Sanyuan Chen , Jinyu Li , Shujie Liu , Furu Wei

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming…

Sound · Computer Science 2022-05-19 Mostafa Karimi , Changliang Liu , Kenichi Kumatani , Yao Qian , Tianyu Wu , Jian Wu

Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these…

Computation and Language · Computer Science 2023-02-21 A Arunkumar , Vrunda N Sukhadia , S. Umesh

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-08 Genshun Wan , Tan Liu , Hang Chen , Jia Pan , Cong Liu , Zhongfu Ye

The excellent generalization ability of self-supervised learning (SSL) for speech foundation models has garnered significant attention. HuBERT is a successful example that utilizes offline clustering to convert speech features into discrete…

Computation and Language · Computer Science 2023-06-16 Ziyang Ma , Zhisheng Zheng , Guanrou Yang , Yu Wang , Chao Zhang , Xie Chen

Pre-trained models, especially self-supervised learning (SSL) models, have demonstrated impressive results in automatic speech recognition (ASR) task. While most applications of SSL models focus on leveraging continuous representations as…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Zehan Li , Yan Yang , Xueqing Li , Jian Kang , Xiao-Lei Zhang , Jie Li

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the…

Sound · Computer Science 2022-06-22 Junyi Ao , Ziqiang Zhang , Long Zhou , Shujie Liu , Haizhou Li , Tom Ko , Lirong Dai , Jinyu Li , Yao Qian , Furu Wei

Self-supervised learning (SSL) has achieved great success in speech-related tasks. While Transformer and Conformer architectures have dominated SSL backbones, encoders like Zipformer, which excel in automatic speech recognition (ASR),…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-25 Yifan Yang , Jianheng Zhuo , Zengrui Jin , Ziyang Ma , Xiaoyu Yang , Zengwei Yao , Liyong Guo , Wei Kang , Fangjun Kuang , Long Lin , Daniel Povey , Xie Chen
‹ Prev 1 2 3 10 Next ›