Related papers: DiffSSD: A Diffusion-Based Dataset For Speech Fore…

Diffuse or Confuse: A Diffusion Deepfake Speech Dataset

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion…

Cryptography and Security · Computer Science 2025-01-15 Anton Firc , Kamil Malinka , Petr Hanáček

FairSSD: Understanding Bias in Synthetic Speech Detectors

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Amit Kumar Singh Yadav , Kratika Bhagtani , Davide Salvi , Paolo Bestagini , Edward J. Delp

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

As speech generation technology advances, the risk of misuse through deepfake audio has become a pressing concern, which underscores the critical need for robust detection systems. However, many existing speech deepfake datasets are limited…

Sound · Computer Science 2025-07-30 Wen Huang , Yanmei Gu , Zhiming Wang , Huijia Zhu , Yanmin Qian

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also…

Computation and Language · Computer Science 2024-09-18 Peizhuo Liu , Li Wang , Renqiang He , Haorui He , Lei Wang , Huadi Zheng , Jie Shi , Tong Xiao , Zhizheng Wu

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by…

Sound · Computer Science 2025-04-16 Jee-weon Jung , Yihan Wu , Xin Wang , Ji-Hoon Kim , Soumi Maiti , Yuta Matsunaga , Hye-jin Shim , Jinchuan Tian , Nicholas Evans , Joon Son Chung , Wangyou Zhang , Seyun Um , Shinnosuke Takamichi , Shinji Watanabe

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training

Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved. Traditional approaches have relied on the autoregressive method to predict the quantized…

Sound · Computer Science 2025-01-22 Hyung-Seok Oh , Sang-Hoon Lee , Seong-Whan Lee

ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts

The problem of synthetic speech detection has enjoyed considerable attention, with recent methods achieving low error rates across several established benchmarks. However, to what extent can low error rates on academic benchmarks translate…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 Ashi Garg , Zexin Cai , Lin Zhang , Henry Li Xinyuan , Leibny Paola García-Perera , Kevin Duh , Sanjeev Khudanpur , Matthew Wiesner , Nicholas Andrews

The DeepSpeak Dataset

Deepfakes represent a growing concern across domains such as disinformation, fraud, and non-consensual media. In particular, the rise of video conference and identity-driven attacks in high-stakes scenarios--such as impostor hiring--demands…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Sarah Barrington , Maty Bohacek , Hany Farid

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

Speech synthesis systems can now produce highly realistic vocalisations that pose significant authenticity challenges. Despite substantial progress in deepfake detection models, their real-world effectiveness is often undermined by evolving…

Sound · Computer Science 2026-02-12 Qizhou Wang , Hanxun Huang , Guansong Pang , Sarah Erfani , Christopher Leckie

Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

Speech synthesis methods can create realistic-sounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic…

Sound · Computer Science 2022-10-17 Emily R. Bartusiak , Edward J. Delp

Open Challenges in Synthetic Speech Detection

In this paper the current status and open challenges of synthetic speech detection are addressed. The work comprises an initial analysis of available open datasets and of existing detection methods, a description of the requirements for new…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-27 Luca Cuccovillo , Christoforos Papastergiopoulos , Anastasios Vafeiadis , Artem Yaroshchuk , Patrick Aichroth , Konstantinos Votis , Dimitrios Tzovaras

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-24 Jinming Zhang , Xuanru Zhou , Jiachen Lian , Shuhe Li , William Li , Zoe Ezzes , Rian Bogley , Lisa Wauters , Zachary Miller , Jet Vonk , Brittany Morin , Maria Gorno-Tempini , Gopala Anumanchipalli

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

Mainstream zero-shot TTS production systems like Voicebox and Seed-TTS achieve human parity speech by leveraging Flow-matching and Diffusion models, respectively. Unfortunately, human-level audio synthesis leads to identity misuse and…

Sound · Computer Science 2024-09-16 Jiawei Du , I-Ming Lin , I-Hsiang Chiu , Xuanjun Chen , Haibin Wu , Wenze Ren , Yu Tsao , Hung-yi Lee , Jyh-Shing Roger Jang

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are…

Sound · Computer Science 2025-02-28 Weihao wu , Zhiwei Lin , Yixuan Zhou , Jingbei Li , Rui Niu , Qinghua Wu , Songjun Cao , Long Ma , Zhiyong Wu

All-for-One and One-For-All: Deep learning-based feature fusion for Synthetic Speech Detection

Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever, leading to possible threats and dangers from malicious users. In the audio field, we are…

Sound · Computer Science 2023-07-31 Daniele Mari , Davide Salvi , Paolo Bestagini , Simone Milani

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Shivam Mehta , Siyang Wang , Simon Alexanderson , Jonas Beskow , Éva Székely , Gustav Eje Henter

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio.…

Sound · Computer Science 2024-04-05 Jiangyan Yi , Chenglong Wang , Jianhua Tao , Chu Yuan Zhang , Cunhang Fan , Zhengkun Tian , Haoxin Ma , Ruibo Fu

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-25 Ting-Yao Hu , Mohammadreza Armandpour , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Oncel Tuzel

DDS: A new device-degraded speech dataset for speech enhancement

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-23 Haoyu Li , Junichi Yamagishi

EnvSDD: Benchmarking Environmental Sound Deepfake Detection

Audio generation systems now create very realistic soundscapes that can enhance media production, but also pose potential risks. Several studies have examined deepfakes in speech or singing voice. However, environmental sounds have…

Sound · Computer Science 2025-09-30 Han Yin , Yang Xiao , Rohan Kumar Das , Jisheng Bai , Haohe Liu , Wenwu Wang , Mark D Plumbley