English
Related papers

Related papers: DiffSSD: A Diffusion-Based Dataset For Speech Fore…

200 papers

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion…

Cryptography and Security · Computer Science 2025-01-15 Anton Firc , Kamil Malinka , Petr Hanáček

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Amit Kumar Singh Yadav , Kratika Bhagtani , Davide Salvi , Paolo Bestagini , Edward J. Delp

As speech generation technology advances, the risk of misuse through deepfake audio has become a pressing concern, which underscores the critical need for robust detection systems. However, many existing speech deepfake datasets are limited…

Sound · Computer Science 2025-07-30 Wen Huang , Yanmei Gu , Zhiming Wang , Huijia Zhu , Yanmin Qian

In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also…

Computation and Language · Computer Science 2024-09-18 Peizhuo Liu , Li Wang , Renqiang He , Haorui He , Lei Wang , Huadi Zheng , Jie Shi , Tong Xiao , Zhizheng Wu

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by…

Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved. Traditional approaches have relied on the autoregressive method to predict the quantized…

Sound · Computer Science 2025-01-22 Hyung-Seok Oh , Sang-Hoon Lee , Seong-Whan Lee

The problem of synthetic speech detection has enjoyed considerable attention, with recent methods achieving low error rates across several established benchmarks. However, to what extent can low error rates on academic benchmarks translate…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 Ashi Garg , Zexin Cai , Lin Zhang , Henry Li Xinyuan , Leibny Paola García-Perera , Kevin Duh , Sanjeev Khudanpur , Matthew Wiesner , Nicholas Andrews

Deepfakes represent a growing concern across domains such as disinformation, fraud, and non-consensual media. In particular, the rise of video conference and identity-driven attacks in high-stakes scenarios--such as impostor hiring--demands…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Sarah Barrington , Maty Bohacek , Hany Farid

Speech synthesis systems can now produce highly realistic vocalisations that pose significant authenticity challenges. Despite substantial progress in deepfake detection models, their real-world effectiveness is often undermined by evolving…

Sound · Computer Science 2026-02-12 Qizhou Wang , Hanxun Huang , Guansong Pang , Sarah Erfani , Christopher Leckie

Speech synthesis methods can create realistic-sounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic…

Sound · Computer Science 2022-10-17 Emily R. Bartusiak , Edward J. Delp

In this paper the current status and open challenges of synthetic speech detection are addressed. The work comprises an initial analysis of available open datasets and of existing detection methods, a description of the requirements for new…

Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency…

Mainstream zero-shot TTS production systems like Voicebox and Seed-TTS achieve human parity speech by leveraging Flow-matching and Diffusion models, respectively. Unfortunately, human-level audio synthesis leads to identity misuse and…

Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are…

Sound · Computer Science 2025-02-28 Weihao wu , Zhiwei Lin , Yixuan Zhou , Jingbei Li , Rui Niu , Qinghua Wu , Songjun Cao , Long Ma , Zhiyong Wu

Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever, leading to possible threats and dangers from malicious users. In the audio field, we are…

Sound · Computer Science 2023-07-31 Daniele Mari , Davide Salvi , Paolo Bestagini , Simone Milani

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Shivam Mehta , Siyang Wang , Simon Alexanderson , Jonas Beskow , Éva Székely , Gustav Eje Henter

Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio.…

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-25 Ting-Yao Hu , Mohammadreza Armandpour , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Oncel Tuzel

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-23 Haoyu Li , Junichi Yamagishi

Audio generation systems now create very realistic soundscapes that can enhance media production, but also pose potential risks. Several studies have examined deepfakes in speech or singing voice. However, environmental sounds have…

Sound · Computer Science 2025-09-30 Han Yin , Yang Xiao , Rohan Kumar Das , Jisheng Bai , Haohe Liu , Wenwu Wang , Mark D Plumbley
‹ Prev 1 2 3 10 Next ›