English
Related papers

Related papers: Multi-task Voice Activated Framework using Self-su…

200 papers

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low…

Sound · Computer Science 2021-01-15 Zhiyun Fan , Meng Li , Shiyu Zhou , Bo Xu

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-10 Marie Kunešová , Zbyněk Zajíc

Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes…

Machine Learning · Computer Science 2023-06-16 Alexei Baevski , Arun Babu , Wei-Ning Hsu , Michael Auli

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the…

Computation and Language · Computer Science 2022-12-06 Ankita Pasad , Ju-Chieh Chou , Karen Livescu

Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-09 Qiushi Zhu , Jie Zhang , Yu Gu , Yuchen Hu , Lirong Dai

Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-25 Samik Sadhu , Di He , Che-Wei Huang , Sri Harish Mallidi , Minhua Wu , Ariya Rastrow , Andreas Stolcke , Jasha Droppo , Roland Maas

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the…

Computation and Language · Computer Science 2020-10-23 Alexei Baevski , Henry Zhou , Abdelrahman Mohamed , Michael Auli

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix…

Sound · Computer Science 2023-05-29 Nik Vaessen , David A. van Leeuwen

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech…

Computation and Language · Computer Science 2021-10-19 Andros Tjandra , Diptanu Gon Choudhury , Frank Zhang , Kritika Singh , Alexis Conneau , Alexei Baevski , Assaf Sela , Yatharth Saraf , Michael Auli

Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-10 Qiu-Shi Zhu , Jie Zhang , Zi-Qiang Zhang , Ming-Hui Wu , Xin Fang , Li-Rong Dai

Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-08 Khazar Khorrami , María Andrea Cruz Blandón , Tuomas Virtanen , Okko Räsänen

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model…

Computation and Language · Computer Science 2019-09-12 Steffen Schneider , Alexei Baevski , Ronan Collobert , Michael Auli

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised…

Machine Learning · Computer Science 2022-10-27 Alexei Baevski , Wei-Ning Hsu , Qiantong Xu , Arun Babu , Jiatao Gu , Michael Auli

Wav2Vec2.0 is a state-of-the-art model which learns speech representations through unlabeled speech data, aka, self supervised learning. The pretrained model is then fine tuned on small amounts of labeled data to use it for speech-to-text…

Sound · Computer Science 2022-02-15 Santosh Gondi

Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems. However, existing methods are either not entirely end-to-end or do not train joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-23 Jiachen Lian , Alexei Baevski , Wei-Ning Hsu , Michael Auli

Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts…

Explaining the decisions made by audio spoofing detection models is crucial for fostering trust in detection outcomes. However, current research on the interpretability of detection models is limited to applying XAI tools to post-trained…

Sound · Computer Science 2025-07-28 Menglu Li , Xiao-Ping Zhang

Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from…

Computation and Language · Computer Science 2022-05-26 Ngoc-Quan Pham , Alex Waibel , Jan Niehues

Self-supervised learning, such as with the wav2vec 2.0 framework significantly improves the accuracy of end-to-end automatic speech recognition (ASR). Wav2vec 2.0 has been applied to single-channel end-to-end ASR models. In this work, we…

Computation and Language · Computer Science 2024-08-07 Atsushi Kojima
‹ Prev 1 2 3 10 Next ›