Related papers: Unsupervised Speech Recognition

wav2vec: Unsupervised Pre-training for Speech Recognition

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model…

Computation and Language · Computer Science 2019-09-12 Steffen Schneider , Alexei Baevski , Ronan Collobert , Michael Auli

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the…

Computation and Language · Computer Science 2020-10-23 Alexei Baevski , Henry Zhou , Abdelrahman Mohamed , Michael Auli

Simple and Effective Unsupervised Speech Translation

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue,…

Computation and Language · Computer Science 2022-10-20 Changhan Wang , Hirofumi Inaguma , Peng-Jen Chen , Ilia Kulikov , Yun Tang , Wei-Ning Hsu , Michael Auli , Juan Pino

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world. There have been some zero-resource methods trying to perform phoneme or word-level speech recognition without labeled…

Computation and Language · Computer Science 2025-01-14 Haoyu Wang , Wei-Qiang Zhang , Hongbin Suo , Yulong Wan

Self-training and Pre-training are Complementary for Speech Recognition

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data. However, it is not clear whether they learn similar patterns or if they can be effectively…

Machine Learning · Computer Science 2020-10-23 Qiantong Xu , Alexei Baevski , Tatiana Likhomanenko , Paden Tomasello , Alexis Conneau , Ronan Collobert , Gabriel Synnaeve , Michael Auli

Wav2Vec2.0 on the Edge: Performance Evaluation

Wav2Vec2.0 is a state-of-the-art model which learns speech representations through unlabeled speech data, aka, self supervised learning. The pretrained model is then fine tuned on small amounts of labeled data to use it for speech-to-text…

Sound · Computer Science 2022-02-15 Santosh Gondi

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech…

Computation and Language · Computer Science 2021-10-19 Andros Tjandra , Diptanu Gon Choudhury , Frank Zhang , Kritika Singh , Alexis Conneau , Alexei Baevski , Assaf Sela , Yatharth Saraf , Michael Auli

Effectiveness of self-supervised pre-training for speech recognition

We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization. We find the former to be more accurate since it builds a good vocabulary of the…

Computation and Language · Computer Science 2020-05-20 Alexei Baevski , Michael Auli , Abdelrahman Mohamed

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching

We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping…

Audio and Speech Processing · Electrical Eng. & Systems 2018-12-27 Chih-Kuan Yeh , Jianshu Chen , Chengzhu Yu , Dong Yu

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an $\textit{unsupervised}$ automatic speech recognition (ASR) system using only unpaired audio and text. Existing unsupervised ASR methods assume that no labeled data can be used for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-19 Tatiana Likhomanenko , Loren Lugosch , Ronan Collobert

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-25 Samik Sadhu , Di He , Che-Wei Huang , Sri Harish Mallidi , Minhua Wu , Ariya Rastrow , Andreas Stolcke , Jasha Droppo , Roland Maas

Towards End-to-end Unsupervised Speech Recognition

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of…

Computation and Language · Computer Science 2022-06-16 Alexander H. Liu , Wei-Ning Hsu , Michael Auli , Alexei Baevski

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data. However, in many cases there is labeled data available for related…

Computation and Language · Computer Science 2021-09-27 Qiantong Xu , Alexei Baevski , Michael Auli

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

Producing a large amount of annotated speech data for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced. However, we note human babies start to learn the language by the sounds…

Computation and Language · Computer Science 2019-04-11 Yi-Chen Chen , Sung-Feng Huang , Hung-yi Lee , Lin-shan Lee

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

Self-supervision has recently shown great promise for learning visual and auditory speech representations from unlabelled data. In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Alexandros Haliassos , Andreas Zinonos , Rodrigo Mira , Stavros Petridis , Maja Pantic

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Sound · Computer Science 2022-01-10 Sangeeta Srivastava , Yun Wang , Andros Tjandra , Anurag Kumar , Chunxi Liu , Kritika Singh , Yatharth Saraf

Towards Unsupervised Speech Recognition Without Pronunciation Models

Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech…

Computation and Language · Computer Science 2025-01-10 Junrui Ni , Liming Wang , Yang Zhang , Kaizhi Qian , Heting Gao , Mark Hasegawa-Johnson , Chang D. Yoo

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of…

Computation and Language · Computer Science 2022-03-22 Hanan Aldarmaki , Asad Ullah , Nazar Zaki

Towards Unsupervised Speech Recognition at the Syllable-Level

Training speech recognizers with unpaired speech and text -- known as unsupervised speech recognition (UASR) -- is a crucial step toward extending ASR to low-resource languages in the long-tail distribution and enabling multimodal learning…

Computation and Language · Computer Science 2025-10-07 Liming Wang , Junrui Ni , Kai-Wei Chang , Saurabhchand Bhati , David Harwath , Mark Hasegawa-Johnson , James R. Glass

Analyzing the Robustness of Unsupervised Speech Recognition

Unsupervised speech recognition (unsupervised ASR) aims to learn the ASR system with non-parallel speech and text corpus only. Wav2vec-U has shown promising results in unsupervised ASR by self-supervised speech representations coupled with…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-27 Guan-Ting Lin , Chan-Jan Hsu , Da-Rong Liu , Hung-Yi Lee , Yu Tsao