English
Related papers

Related papers: Speech Enhancement with Multi-granularity Vector Q…

200 papers

With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-29 Xiao-Ying Zhao , Qiu-Shi Zhu , Jie Zhang

Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label…

Sound · Computer Science 2024-02-27 Szu-Wei Fu , Kuo-Hsuan Hung , Yu Tsao , Yu-Chiang Frank Wang

Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques. This paper introduces an efficient and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-16 Xingwei Sun , Heinrich Dinkel , Yadong Niu , Linzhang Wang , Junbo Zhang , Jian Luan

In this paper, we explore vector quantization for acoustic unit discovery. Leveraging unlabelled data, we aim to learn discrete representations of speech that separate phonetic content from speaker-specific details. We propose two neural…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-20 Benjamin van Niekerk , Leanne Nortje , Herman Kamper

The deep learning-based speech enhancement (SE) methods always take the clean speech's waveform or time-frequency spectrum feature as the learning target, and train the deep neural network (DNN) by reducing the error loss between the DNN's…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-02 Yuewei Zhang , Huanbin Zou , Jie Zhu

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-09 Da-Yi Wu , Yen-Hao Chen , Hung-Yi Lee

Most recent studies on deep learning based speech enhancement (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-08 Jyun-Yi Wu , Cheng Yu , Szu-Wei Fu , Chih-Ting Liu , Shao-Yi Chien , Yu Tsao

Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Bryce Irvin , Marko Stamenovic , Mikolaj Kegler , Li-Chia Yang

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture. Compared to conventional approaches that…

Sound · Computer Science 2020-11-05 Ying Shi , Haolin Chen , Zhiyuan Tang , Lantian Li , Dong Wang , Jiqing Han

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-22 Disong Wang , Liqun Deng , Yu Ting Yeung , Xiao Chen , Xunying Liu , Helen Meng

Real-time speech enhancement (SE) is essential to online speech communication. Causal SE models use only the previous context while predicting future information, such as phoneme continuation, may help performing causal SE. The phonetic…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-30 Emiru Tsunoo , Yuki Saito , Wataru Nakata , Hiroshi Saruwatari

Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream…

Machine Learning · Computer Science 2025-07-10 Wonjin Jung , Sungil Kang , Dong-Yeon Cho

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent…

Sound · Computer Science 2022-04-19 Jen-Cheng Hou , Syu-Siang Wang , Ying-Hui Lai , Yu Tsao , Hsiu-Wen Chang , Hsin-Min Wang

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent…

Sound · Computer Science 2018-01-25 Jen-Cheng Hou , Syu-Siang Wang , Ying-Hui Lai , Yu Tsao , Hsiu-Wen Chang , Hsin-Min Wang

Recent years have seen remarkable progress in speech emotion recognition (SER), thanks to advances in deep learning techniques. However, the limited availability of labeled data remains a significant challenge in the field. Self-supervised…

Sound · Computer Science 2023-04-24 Samir Sadok , Simon Leglaive , Renaud Séguier

Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-16 Hejung Yang , Hong-Goo Kang

Recently, generative speech enhancement has garnered considerable interest; however, existing approaches are hindered by excessive complexity, limited efficiency, and suboptimal speech quality. To overcome these challenges, this paper…

Sound · Computer Science 2026-02-03 Fei Liu , Yang Ai

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion. On the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-27 Qiu-Shi Zhu , Jie Zhang , Zi-Qiang Zhang , Li-Rong Dai

Vector Quantization (VQ) has become the cornerstone of tokenization for many multimodal Large Language Models and diffusion synthesis. However, existing VQ paradigms suffer from a fundamental conflict: they enforce discretization before the…

Machine Learning · Computer Science 2026-03-25 Wenhao Zhao , Qiran Zou , Zhouhan Lin , Dianbo Liu

Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which…

Sound · Computer Science 2022-07-08 Xue Jiang , Xiulian Peng , Huaying Xue , Yuan Zhang , Yan Lu
‹ Prev 1 2 3 10 Next ›