Related papers: Diff-ETS: Learning a Diffusion Probabilistic Model…

Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-28 Soowon Kim , Young-Eun Lee , Seo-Hyun Lee , Seong-Whan Lee

Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Voiced Electromyography (EMG)-to-Speech (V-ETS) models reconstruct speech from muscle activity signals, facilitating applications such as neurolaryngologic diagnostics. Despite its potential, the advancement of V-ETS is hindered by a…

Sound · Computer Science 2026-01-13 Xiaodan Chen , Xiaoxue Gao , Mathias Quoy , Alexandre Pitti , Nancy F. Chen

ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning

Diffusion models have demonstrated remarkable performance in speech synthesis, but typically require multi-step sampling, resulting in low inference efficiency. Recent studies address this issue by distilling diffusion models into…

Sound · Computer Science 2025-10-08 Tao Zhu , Yinfeng Yu , Liejun Wang , Fuchun Sun , Wendong Zheng

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory…

Sound · Computer Science 2023-06-02 Haobin Tang , Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao

An Improved Model for Voicing Silent Speech

In this paper, we present an improved model for voicing silent speech, where audio is synthesized from facial electromyography (EMG) signals. To give our model greater flexibility to learn its own input features, we directly use EMG signals…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-22 David Gaddy , Dan Klein

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Brain-Driven Representation Learning Based on Diffusion Model

Interpreting EEG signals linked to spoken language presents a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. Denoising diffusion probabilistic models (DDPMs), which have…

Computation and Language · Computer Science 2023-11-15 Soowon Kim , Seo-Hyun Lee , Young-Eun Lee , Ji-Won Lee , Ji-Ha Park , Seong-Whan Lee

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

We propose Easy End-to-End Diffusion-based Text to Speech, a simple and efficient end-to-end text-to-speech model based on diffusion. E3 TTS directly takes plain text as input and generates an audio waveform through an iterative refinement…

Sound · Computer Science 2023-11-03 Yuan Gao , Nobuyuki Morioka , Yu Zhang , Nanxin Chen

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Haobin Tang , Xulong Zhang , Ning Cheng , Jing Xiao , Jianzong Wang

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data

Creating synthetic voices with found data is challenging, as real-world recordings often contain various types of audio degradation. One way to address this problem is to pre-enhance the speech with an enhancement model and then use the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Yusheng Tian , Wei Liu , Tan Lee

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

While the performance of cross-lingual TTS based on monolingual corpora has been significantly improved recently, generating cross-lingual speech still suffers from the foreign accent problem, leading to limited naturalness. Besides,…

Sound · Computer Science 2023-09-06 Tao Li , Chenxu Hu , Jian Cong , Xinfa Zhu , Jingbei Li , Qiao Tian , Yuping Wang , Lei Xie

EEG Synthetic Data Generation Using Probabilistic Diffusion Models

Electroencephalography (EEG) plays a significant role in the Brain Computer Interface (BCI) domain, due to its non-invasive nature, low cost, and ease of use, making it a highly desirable option for widespread adoption by the general…

Signal Processing · Electrical Eng. & Systems 2023-03-13 Giulio Tosato , Cesare M. Dalbagno , Francesco Fumagalli

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Shivam Mehta , Siyang Wang , Simon Alexanderson , Jonas Beskow , Éva Székely , Gustav Eje Henter

EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech

State-of-the-art speech synthesis models try to get as close as possible to the human voice. Hence, modelling emotions is an essential part of Text-To-Speech (TTS) research. In our work, we selected FastSpeech2 as the starting point and…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-04 Daria Diatlova , Vitaly Shutov

Digital Voicing of Silent Speech

In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-08 David Gaddy , Dan Klein

Target Speech Extraction with Conditional Diffusion Model

Diffusion model-based speech enhancement has received increased attention since it can generate very natural enhanced signals and generalizes well to unseen conditions. Diffusion models have been explored for several sub-tasks of speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-21 Naoyuki Kamo , Marc Delcroix , Tomohiro Nakatani

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In…

Sound · Computer Science 2023-11-21 Ge Zhu , Yutong Wen , Marc-André Carbonneau , Zhiyao Duan

SEED: Speaker Embedding Enhancement Diffusion Model

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 KiHyun Nam , Jungwoo Heo , Jee-weon Jung , Gangin Park , Chaeyoung Jung , Ha-Jin Yu , Joon Son Chung

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing…

Machine Learning · Computer Science 2021-08-06 Vadim Popov , Ivan Vovk , Vladimir Gogoryan , Tasnima Sadekova , Mikhail Kudinov

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen…

Sound · Computer Science 2023-05-24 Minki Kang , Wooseok Han , Sung Ju Hwang , Eunho Yang