English
Related papers

Related papers: Codec2Vec: Self-Supervised Speech Representation L…

200 papers

Neural audio codecs have revolutionized audio processing by enabling speech tasks to be performed on highly compressed representations. Recent work has shown that speech separation can be achieved within these compressed domains, offering…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-28 Jia Qi Yip , Chin Yuen Kwok , Bin Ma , Eng Siong Chng

Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems. However, existing methods are either not entirely end-to-end or do not train joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-23 Jiachen Lian , Alexei Baevski , Wei-Ning Hsu , Michael Auli

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-26 Kunal Dhawan , Nithin Rao Koluguri , Ante Jukić , Ryan Langman , Jagadeesh Balam , Boris Ginsburg

Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes…

Machine Learning · Computer Science 2023-06-16 Alexei Baevski , Arun Babu , Wei-Ning Hsu , Michael Auli

Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-25 Samik Sadhu , Di He , Che-Wei Huang , Sri Harish Mallidi , Minhua Wu , Ariya Rastrow , Andreas Stolcke , Jasha Droppo , Roland Maas

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map…

Neurons and Cognition · Quantitative Biology 2024-05-30 Brian A. Yuan , Joseph G. Makin

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs…

Sound · Computer Science 2024-10-04 Yuzhe Gu , Enmao Diao

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised…

Machine Learning · Computer Science 2022-10-27 Alexei Baevski , Wei-Ning Hsu , Qiantong Xu , Arun Babu , Jiatao Gu , Michael Auli

The advent of neural audio codecs has increased in popularity due to their potential for efficiently modeling audio with transformers. Such advanced codecs represent audio from a highly continuous waveform to low-sampled discrete units. In…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Samir Sadok , Julien Hauret , Éric Bavu

Current large speech language models are mainly based on semantic tokens from discretization of self-supervised learned representations and acoustic tokens from a neural codec, following a semantic-modeling and acoustic-synthesis paradigm.…

Sound · Computer Science 2025-10-16 Xue Jiang , Xiulian Peng , Yuan Zhang , Yan Lu

Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into…

Sound · Computer Science 2024-10-22 Peiji Yang , Fengping Wang , Yicheng Zhong , Huawei Wei , Zhisheng Wang

Universal audio codecs learn entangled representations across audio types, whereas some specific codecs offer decoupled representations but are limited to speech. Real-world audio, however, often contains mixed speech and background sounds,…

Sound · Computer Science 2025-09-12 Xiaoxue Luo , Jinwei Huang , Runyan Yang , Yingying Gao , Junlan Feng , Chao Deng , Shilei Zhang

The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-17 Varun Krishna , Tarun Sai , Sriram Ganapathy

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to…

Computation and Language · Computer Science 2018-06-12 Yu-An Chung , James Glass

With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Zhichao Huang , Chutong Meng , Tom Ko

Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to…

Sound · Computer Science 2023-07-06 Chutong Meng , Junyi Ao , Tom Ko , Mingxuan Wang , Haizhou Li

Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content,…

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain…

Computation and Language · Computer Science 2017-11-07 Yu-An Chung , James Glass

The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music and environmental sounds. To approach this problem, methods inspired by works on…

Sound · Computer Science 2024-05-22 Leonardo Pepino , Pablo Riera , Luciana Ferrer
‹ Prev 1 2 3 10 Next ›