Related papers: A Robust Frame-based Nonlinear Prediction System f…

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies…

Computation and Language · Computer Science 2020-11-03 Alexander H. Liu , Yu-An Chung , James Glass

Blind phoneme segmentation with temporal prediction errors

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in…

Computation and Language · Computer Science 2017-05-30 Paul Michel , Okko Räsänen , Roland Thiollière , Emmanuel Dupoux

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific…

Sound · Computer Science 2019-07-18 Arindam Jati , Panayiotis Georgiou

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Low and ultra-low-bitrate neural speech coding achieves unprecedented coding gain by generating speech signals from compact speech features. This paper introduces additional coding efficiency in neural speech coding by reducing the temporal…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Haici Yang , Wootaek Lim , Minje Kim

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Typically, unsupervised segmentation of speech into the phone and word-like units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-12 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. Training such models, however, is quite inefficient and unstable. In this work, we show how by simply changing the temporal…

Neural and Evolutionary Computing · Computer Science 2024-02-08 Tommaso Salvatori , Yuhang Song , Yordan Yordanov , Beren Millidge , Zhenghua Xu , Lei Sha , Cornelius Emde , Rafal Bogacz , Thomas Lukasiewicz

Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder

In this paper, we describe a statistical parametric speech synthesis approach with unit-level acoustic representation. In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration…

Sound · Computer Science 2016-06-21 Sivanand Achanta , KNRK Raju Alluri , Suryakanth V Gangashetty

End-to-End Optimized Speech Coding with Deep Neural Networks

Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which…

Sound · Computer Science 2021-07-09 Srihari Kankanahalli

Variational Speech Waveform Compression to Catalyze Semantic Communications

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and…

Sound · Computer Science 2022-12-14 Shengshi Yao , Zixuan Xiao , Sixian Wang , Jincheng Dai , Kai Niu , Ping Zhang

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech. One example is Autoregressive Predictive Coding (Chung et al., 2019), which trains an…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-14 Yu-An Chung , James Glass

Language Reconstruction with Brain Predictive Coding from fMRI Data

Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded…

Computation and Language · Computer Science 2026-04-14 Congchi Yin , Ziyi Ye , Piji Li

Nonlinear predictive models computation in ADPCM schemes

Recently several papers have been published on nonlinear prediction applied to speech coding. At ICASSP98 we presented a system based on an ADPCM scheme with a nonlinear predictor based on a neural net. The most critical parameter was the…

Sound · Computer Science 2022-03-07 Marcos Faundez-Zanuy

Anti-aliasing Predictive Coding Network for Future Video Frame Prediction

We introduce here a predictive coding based model that aims to generate accurate and sharp future frames. Inspired by the predictive coding hypothesis and related works, the total model is updated through a combination of bottom-up and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Chaofan Ling , Weihua Li , Junpei Zhong

Latent-Domain Predictive Neural Speech Coding

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind…

Sound · Computer Science 2025-10-16 Xue Jiang , Xiulian Peng , Huaying Xue , Yuan Zhang , Yan Lu

Towards Error-Resilient Neural Speech Coding

Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend…

Sound · Computer Science 2022-07-05 Huaying Xue , Xiulian Peng , Xue Jiang , Yan Lu

Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets

Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any…

Computation and Language · Computer Science 2020-07-09 María Andrea Cruz Blandón , Okko Räsänen

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-24 Zhepei Wang , Ritwik Giri , Devansh Shah , Jean-Marc Valin , Michael M. Goodwin , Paris Smaragdis

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing…

Machine Learning · Computer Science 2024-09-13 Santiago Cuervo , Maciej Grabias , Jan Chorowski , Grzegorz Ciesielski , Adrian Łańcucki , Paweł Rychlikowski , Ricard Marxer

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings. The embedding can be extracted efficiently with linear activation in the embedding layer. To understand…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-13 Suwon Shon , Hao Tang , James Glass