CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Chutong Meng; Junyi Ao; Tom Ko; Mingxuan Wang; Haizhou Li

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Sound 2023-07-06 v3 Audio and Speech Processing

Authors: Chutong Meng , Junyi Ao , Tom Ko , Mingxuan Wang , Haizhou Li

Abstract

Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning, where we predict the code representations based on a masked view of the original speech input. Unlike the prior self-distillation approaches of which the teacher and the student are of the same modality, our target model predicts representations from a different modality. CoBERT outperforms the most recent state-of-the-art performance on the ASR task and brings significant improvements on the SUPERB speech translation (ST) task. Our code and models are released at https://github.com/mct10/CoBERT.

Keywords

bert pre-trained language model self-supervised learning

Cite

@article{arxiv.2210.04062,
  title  = {CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning},
  author = {Chutong Meng and Junyi Ao and Tom Ko and Mingxuan Wang and Haizhou Li},
  journal= {arXiv preprint arXiv:2210.04062},
  year   = {2023}
}

Comments

Accepted by Interspeech 2023

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Abstract

Keywords

Cite

Comments

Related papers