Related papers: Multi-task self-supervised learning for Robust Spe…

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some…

Machine Learning · Computer Science 2019-04-09 Santiago Pascual , Mirco Ravanelli , Joan Serrà , Antonio Bonafonte , Yoshua Bengio

PASE: Phoneme-Aware Speech Encoder to Improve Lip Sync Accuracy for Talking Head Synthesis

Recent talking head synthesis works typically adopt speech features extracted from large-scale pre-trained acoustic models. However, the intrinsic many-to-many relationship between speech and lip motion causes phoneme-viseme alignment…

Graphics · Computer Science 2025-10-16 Yihuan Huang , Jiajun Liu , Yanzhen Ren , Jun Xue , Wuyang Liu , Zongkun Sun

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech…

Computation and Language · Computer Science 2021-12-15 Pierre Beckmann , Mikolaj Kegler , Milos Cernak

PAS-SE: Personalized Auxiliary-Sensor Speech Enhancement for Voice Pickup in Hearables

Speech enhancement for voice pickup in hearables aims to improve the user's voice by suppressing noise and interfering talkers, while maintaining own-voice quality. For single-channel methods, it is particularly challenging to distinguish…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-05 Mattes Ohlenbusch , Mikolaj Kegler , Marko Stamenovic

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, such as speaker embeddings like d-vectors, to remove background noise and interfering speech in real-time and thus improve the speech quality of online video conferencing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-20 Sefik Emre Eskimez , Takuya Yoshioka , Huaming Wang , Xiaofei Wang , Zhuo Chen , Xuedong Huang

PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement

Generative models have shown remarkable performance in speech enhancement (SE), achieving superior perceptual quality over traditional discriminative approaches. However, existing generative SE approaches often overlook the risk of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-18 Xiaobin Rong , Qinwen Hu , Mansur Yesilbursa , Kamil Wojcicki , Jing Lu

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve…

Sound · Computer Science 2022-01-02 Yi Li , Yang Sun , Syed Mohsen Naqvi

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion. On the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-27 Qiu-Shi Zhu , Jie Zhang , Zi-Qiang Zhang , Li-Rong Dai

PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-21 Wonbin Kim , Hyun-seo Shin , Ju-ho Kim , Jungwoo Heo , Chan-yeong Lim , Ha-Jin Yu

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Automatic speech recognition (ASR) has gained remarkable successes thanks to recent advances of deep learning, but it usually degrades significantly under real-world noisy conditions. Recent works introduce speech enhancement (SE) as…

Audio and Speech Processing · Electrical Eng. & Systems 2024-04-19 Yuchen Hu , Chen Chen , Qiushi Zhu , Eng Siong Chng

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-10 Marie Kunešová , Zbyněk Zajíc

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally…

Sound · Computer Science 2025-01-22 He Huang , Taejin Park , Kunal Dhawan , Ivan Medennikov , Krishna C. Puvvada , Nithin Rao Koluguri , Weiqing Wang , Jagadeesh Balam , Boris Ginsburg

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the…

Sound · Computer Science 2024-08-27 Zhaoxi Mu , Xinyu Yang , Sining Sun , Qing Yang

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve…

Computation and Language · Computer Science 2022-10-25 Hao Yang , Jinming Zhao , Gholamreza Haffari , Ehsan Shareghi

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have…

Sound · Computer Science 2022-01-10 Sangeeta Srivastava , Yun Wang , Andros Tjandra , Anurag Kumar , Chunxi Liu , Kritika Singh , Yatharth Saraf

Personalized Speech Enhancement Without a Separate Speaker Embedding Model

Personalized speech enhancement (PSE) models can improve the audio quality of teleconferencing systems by adapting to the characteristics of a speaker's voice. However, most existing methods require a separate speaker embedding model to…

Sound · Computer Science 2024-06-17 Tanel Pärnamaa , Ando Saabas

Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Many speech enhancement methods try to learn the relationship between noisy and clean speech, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of…

Computation and Language · Computer Science 2018-12-26 Geonmin Kim , Hwaran Lee , Bo-Kyeong Kim , Sang-Hoon Oh , Soo-Young Lee