Related papers: CoBERT: Self-Supervised Speech Representation Lear…

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase,…

Computation and Language · Computer Science 2021-06-15 Wei-Ning Hsu , Benjamin Bolte , Yao-Hung Hubert Tsai , Kushal Lakhotia , Ruslan Salakhutdinov , Abdelrahman Mohamed

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

For self-supervised speech processing, it is crucial to use pretrained models as speech representation extractors. In recent works, increasing the size of the model has been utilized in acoustic model training in order to achieve better…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-04 Po-Han Chi , Pei-Hung Chung , Tsung-Han Wu , Chun-Cheng Hsieh , Yen-Hao Chen , Shang-Wen Li , Hung-yi Lee

Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT

Self-supervised speech representation learning has become essential for extracting meaningful features from untranscribed audio. Recent advances highlight the potential of deriving discrete symbols from the features correlated with…

Computation and Language · Computer Science 2024-09-17 Ryota Komatsu , Takahiro Shinozaki

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success…

Computation and Language · Computer Science 2022-04-29 Heng-Jui Chang , Shu-wen Yang , Hung-yi Lee

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-09 Jingru Lin , Meng Ge , Wupeng Wang , Haizhou Li , Mengling Feng

Autoregressive Co-Training for Learning Discrete Speech Representations

While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other. In this paper, we consider a generative model with discrete…

Computation and Language · Computer Science 2022-11-01 Sung-Lin Yeh , Hao Tang

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-06 Duo Ma , Xianghu Yue , Junyi Ao , Xiaoxue Gao , Haizhou Li

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent strides within this domain have been…

Computation and Language · Computer Science 2024-06-21 Bowen Zhang , Kehua Chang , Chunping Li

BrainBERT: Self-supervised representation learning for intracranial recordings

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts,…

Machine Learning · Computer Science 2023-03-01 Christopher Wang , Vighnesh Subramaniam , Adam Uri Yaari , Gabriel Kreiman , Boris Katz , Ignacio Cases , Andrei Barbu

Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio

Self-supervised learning has been used to leverage unlabelled data, improving accuracy and generalisation of speech systems through the training of representation models. While many recent works have sought to produce effective…

Computation and Language · Computer Science 2023-10-18 Antoni Dimitriadis , Siqi Pan , Vidhyasaharan Sethu , Beena Ahmed

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to…

Computation and Language · Computer Science 2022-06-22 Chengyi Wang , Yiming Wang , Yu Wu , Sanyuan Chen , Jinyu Li , Shujie Liu , Furu Wei

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Rui Wang , Qibing Bai , Junyi Ao , Long Zhou , Zhixiang Xiong , Zhihua Wei , Yu Zhang , Tom Ko , Haizhou Li

CCBERT: Self-Supervised Code Change Representation Learning

Numerous code changes are made by developers in their daily work, and a superior representation of code changes is desired for effective code change analysis. Recently, Hoang et al. proposed CC2Vec, a neural network-based approach that…

Software Engineering · Computer Science 2023-09-28 Xin Zhou , Bowen Xu , DongGyun Han , Zhou Yang , Junda He , David Lo

MetricBERT: Text Representation Learning via Self-Supervised Triplet Training

We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for…

Computation and Language · Computer Science 2022-08-16 Itzik Malkiel , Dvir Ginzburg , Oren Barkan , Avi Caciularu , Yoni Weill , Noam Koenigstein

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction

Sign language processing has traditionally relied on task-specific models, limiting the potential for transfer learning across tasks. Pre-training methods for sign language have typically focused on either supervised pre-training, which…

Computation and Language · Computer Science 2025-07-04 Shester Gueuwou , Xiaodan Du , Greg Shakhnarovich , Karen Livescu , Alexander H. Liu

VideoBERT: A Joint Model for Video and Language Representation Learning

Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches learn low-level representations, we propose a joint…

Computer Vision and Pattern Recognition · Computer Science 2019-09-13 Chen Sun , Austin Myers , Carl Vondrick , Kevin Murphy , Cordelia Schmid

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech…

Sound · Computer Science 2024-01-31 Jiatong Shi , Hirofumi Inaguma , Xutai Ma , Ilia Kulikov , Anna Sun

Coreferential Reasoning Learning for Language Representation

Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However,…

Computation and Language · Computer Science 2020-10-07 Deming Ye , Yankai Lin , Jiaju Du , Zhenghao Liu , Peng Li , Maosong Sun , Zhiyuan Liu

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Code representation learning, which aims to encode the semantics of source code into distributed vectors, plays an important role in recent deep-learning-based models for code intelligence. Recently, many pre-trained language models for…

Computation and Language · Computer Science 2021-09-10 Xin Wang , Yasheng Wang , Fei Mi , Pingyi Zhou , Yao Wan , Xiao Liu , Li Li , Hao Wu , Jin Liu , Xin Jiang

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply…

Computation and Language · Computer Science 2021-10-12 Guolin Zheng , Yubei Xiao , Ke Gong , Pan Zhou , Xiaodan Liang , Liang Lin