English
Related papers

Related papers: A Robust Frame-based Nonlinear Prediction System f…

200 papers

Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies…

Computation and Language · Computer Science 2020-11-03 Alexander H. Liu , Yu-An Chung , James Glass

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in…

Computation and Language · Computer Science 2017-05-30 Paul Michel , Okko Räsänen , Roland Thiollière , Emmanuel Dupoux

Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific…

Sound · Computer Science 2019-07-18 Arindam Jati , Panayiotis Georgiou

Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Low and ultra-low-bitrate neural speech coding achieves unprecedented coding gain by generating speech signals from compact speech features. This paper introduces additional coding efficiency in neural speech coding by reducing the temporal…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Haici Yang , Wootaek Lim , Minje Kim

Typically, unsupervised segmentation of speech into the phone and word-like units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-12 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. Training such models, however, is quite inefficient and unstable. In this work, we show how by simply changing the temporal…

Neural and Evolutionary Computing · Computer Science 2024-02-08 Tommaso Salvatori , Yuhang Song , Yordan Yordanov , Beren Millidge , Zhenghua Xu , Lei Sha , Cornelius Emde , Rafal Bogacz , Thomas Lukasiewicz

In this paper, we describe a statistical parametric speech synthesis approach with unit-level acoustic representation. In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration…

Sound · Computer Science 2016-06-21 Sivanand Achanta , KNRK Raju Alluri , Suryakanth V Gangashetty

Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which…

Sound · Computer Science 2021-07-09 Srihari Kankanahalli

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and…

Sound · Computer Science 2022-12-14 Shengshi Yao , Zixuan Xiao , Sixian Wang , Jincheng Dai , Kai Niu , Ping Zhang

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech. One example is Autoregressive Predictive Coding (Chung et al., 2019), which trains an…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-14 Yu-An Chung , James Glass

Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded…

Computation and Language · Computer Science 2026-04-14 Congchi Yin , Ziyi Ye , Piji Li

Recently several papers have been published on nonlinear prediction applied to speech coding. At ICASSP98 we presented a system based on an ADPCM scheme with a nonlinear predictor based on a neural net. The most critical parameter was the…

Sound · Computer Science 2022-03-07 Marcos Faundez-Zanuy

We introduce here a predictive coding based model that aims to generate accurate and sharp future frames. Inspired by the predictive coding hypothesis and related works, the total model is updated through a combination of bottom-up and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Chaofan Ling , Weihua Li , Junpei Zhong

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind…

Sound · Computer Science 2025-10-16 Xue Jiang , Xiulian Peng , Huaying Xue , Yuan Zhang , Yan Lu

Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend…

Sound · Computer Science 2022-07-05 Huaying Xue , Xiulian Peng , Xue Jiang , Yan Lu

Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any…

Computation and Language · Computer Science 2020-07-09 María Andrea Cruz Blandón , Okko Räsänen

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-24 Zhepei Wang , Ritwik Giri , Devansh Shah , Jean-Marc Valin , Michael M. Goodwin , Paris Smaragdis

We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing…

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings. The embedding can be extracted efficiently with linear activation in the embedding layer. To understand…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-13 Suwon Shon , Hao Tang , James Glass
‹ Prev 1 2 3 10 Next ›