Related papers: Generative Speech Coding with Predictive Variance …

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-08 Woo Hyun Kang , Jahangir Alam , Abderrahim Fathan

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-19 Heming Wang , Meng Yu , Hao Zhang , Chunlei Zhang , Zhongweiyang Xu , Muqiao Yang , Yixuan Zhang , Dong Yu

Combined Generative and Predictive Modeling for Speech Super-resolution

Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-26 Heming Wang , Eric W. Healy , DeLiang Wang

Enhancing Noise Robustness for Neural Speech Codecs through Resource-Efficient Progressive Quantization Perturbation Simulation

Noise robustness remains a critical challenge for deploying neural speech codecs in real-world acoustic scenarios where background noise is often inevitable. A key observation we make is that even slight input noise perturbations can cause…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Rui-Chen Zheng , Yang Ai , Hui-Peng Du , Li-Rong Dai

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Learning Robust Representations of Text

Deep neural networks have achieved remarkable results across many language processing tasks, however these methods are highly sensitive to noise and adversarial attacks. We present a regularization based method for limiting network…

Computation and Language · Computer Science 2016-09-21 Yitong Li , Trevor Cohn , Timothy Baldwin

Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-17 Haoyu Li , Junichi Yamagishi

Reduction of finite sampling noise in quantum neural networks

Quantum neural networks (QNNs) use parameterized quantum circuits with data-dependent inputs and generate outputs through the evaluation of expectation values. Calculating these expectation values necessitates repeated circuit evaluations,…

Quantum Physics · Physics 2024-06-26 David A. Kreplin , Marco Roth

Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning

Likelihood training and maximization-based decoding result in dull and repetitive generated texts even when using powerful language models (Holtzman et al., 2019). Adding a loss function for regularization was shown to improve text…

Computation and Language · Computer Science 2021-01-13 Evgeny Lagutin , Daniil Gavrilov , Pavel Kalaidin

DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration

In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-30 Sanberk Serbest , Tijana Stojkovic , Milos Cernak , Andrew Harper

Enhance audio generation controllability through representation similarity regularization

This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the…

Sound · Computer Science 2023-09-19 Yangyang Shi , Gael Le Lan , Varun Nagaraja , Zhaoheng Ni , Xinhao Mei , Ernie Chang , Forrest Iandola , Yang Liu , Vikas Chandra

Taming Repetition in Dialogue Generation

The wave of pre-training language models has been continuously improving the quality of the machine-generated conversations, however, some of the generated responses still suffer from excessive repetition, sometimes repeating words from…

Computation and Language · Computer Science 2021-12-17 Yadong Xi , Jiashu Pu , Xiaoxi Mao

Regularizing Contrastive Predictive Coding for Speech Applications

Self-supervised methods such as Contrastive predictive Coding (CPC) have greatly improved the quality of the unsupervised representations. These representations significantly reduce the amount of labeled data needed for downstream task…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-27 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Regularizing towards Causal Invariance: Linear Models with Proxies

We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, when noisy proxies of those variables are available. Our approach takes the form of a regularization term…

Machine Learning · Computer Science 2021-06-29 Michael Oberst , Nikolaj Thams , Jonas Peters , David Sontag

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from…

Computation and Language · Computer Science 2023-05-30 Itai Gat , Felix Kreuk , Tu Anh Nguyen , Ann Lee , Jade Copet , Gabriel Synnaeve , Emmanuel Dupoux , Yossi Adi

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-27 Wissam A. Jassim , Jan Skoglund , Michael Chinen , Andrew Hines

Rectified Noise: A Generative Model Using Positive-incentive Noise

Rectified Flow (RF) has been widely used as an effective generative model. Although RF is primarily based on probability flow Ordinary Differential Equations (ODE), recent studies have shown that injecting noise through reverse-time…

Machine Learning · Computer Science 2025-11-13 Zhenyu Gu , Yanchen Xu , Sida Huang , Yubin Guo , Hongyuan Zhang

Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding

Classical parametric speech coding techniques provide a compact representation for speech signals. This affords a very low transmission rate but with a reduced perceptual quality of the reconstructed signals. Recently, autoregressive deep…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-02 Ahmed Mustafa , Arijit Biswas , Christian Bergler , Julia Schottenhamml , Andreas Maier

Discriminative Regularization for Generative Models

We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to…

Machine Learning · Statistics 2016-02-16 Alex Lamb , Vincent Dumoulin , Aaron Courville

Speech Signal Improvement Using Causal Generative Diffusion Models

In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-16 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Tal Peer , Timo Gerkmann