English
Related papers

Related papers: Multi-task self-supervised learning for Robust Spe…

200 papers

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some…

Machine Learning · Computer Science 2019-04-09 Santiago Pascual , Mirco Ravanelli , Joan Serrà , Antonio Bonafonte , Yoshua Bengio

Recent talking head synthesis works typically adopt speech features extracted from large-scale pre-trained acoustic models. However, the intrinsic many-to-many relationship between speech and lip motion causes phoneme-viseme alignment…

Graphics · Computer Science 2025-10-16 Yihuan Huang , Jiajun Liu , Yanzhen Ren , Jun Xue , Wuyang Liu , Zongkun Sun

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech…

Computation and Language · Computer Science 2021-12-15 Pierre Beckmann , Mikolaj Kegler , Milos Cernak

Speech enhancement for voice pickup in hearables aims to improve the user's voice by suppressing noise and interfering talkers, while maintaining own-voice quality. For single-channel methods, it is particularly challenging to distinguish…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-05 Mattes Ohlenbusch , Mikolaj Kegler , Marko Stamenovic

Personalized speech enhancement (PSE) models utilize additional cues, such as speaker embeddings like d-vectors, to remove background noise and interfering speech in real-time and thus improve the speech quality of online video conferencing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-20 Sefik Emre Eskimez , Takuya Yoshioka , Huaming Wang , Xiaofei Wang , Zhuo Chen , Xuedong Huang

Generative models have shown remarkable performance in speech enhancement (SE), achieving superior perceptual quality over traditional discriminative approaches. However, existing generative SE approaches often overlook the risk of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-18 Xiaobin Rong , Qinwen Hu , Mansur Yesilbursa , Kamil Wojcicki , Jing Lu

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve…

Sound · Computer Science 2022-01-02 Yi Li , Yang Sun , Syed Mohsen Naqvi

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion. On the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-27 Qiu-Shi Zhu , Jie Zhang , Zi-Qiang Zhang , Li-Rong Dai

Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-21 Wonbin Kim , Hyun-seo Shin , Ju-ho Kim , Jungwoo Heo , Chan-yeong Lim , Ha-Jin Yu

Automatic speech recognition (ASR) has gained remarkable successes thanks to recent advances of deep learning, but it usually degrades significantly under real-world noisy conditions. Recent works introduce speech enhancement (SE) as…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-19 Yuchen Hu , Chen Chen , Qiushi Zhu , Eng Siong Chng

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-10 Marie Kunešová , Zbyněk Zajíc

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally…

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve…

Computation and Language · Computer Science 2022-10-25 Hao Yang , Jinming Zhao , Gholamreza Haffari , Ehsan Shareghi

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Personalized speech enhancement (PSE) models can improve the audio quality of teleconferencing systems by adapting to the characteristics of a speaker's voice. However, most existing methods require a separate speaker embedding model to…

Sound · Computer Science 2024-06-17 Tanel Pärnamaa , Ando Saabas

Many speech enhancement methods try to learn the relationship between noisy and clean speech, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of…

Computation and Language · Computer Science 2018-12-26 Geonmin Kim , Hwaran Lee , Bo-Kyeong Kim , Sang-Hoon Oh , Soo-Young Lee
‹ Prev 1 2 3 10 Next ›