Related papers: Synthesized Speech Detection Using Convolutional T…

Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs

With the huge technological advances introduced by deep learning in audio & speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be…

Sound · Computer Science 2023-09-18 Md Awsafur Rahman , Bishmoy Paul , Najibul Haque Sarker , Zaber Ibn Abdul Hakim , Shaikh Anowarul Fattah , Mohammad Saquib

SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers

In recent years, Speech Emotion Recognition (SER) has been investigated mainly transforming the speech signal into spectrograms that are then classified using Convolutional Neural Networks pretrained on generic images and fine tuned with…

Sound · Computer Science 2022-11-07 A. Arezzo , S. Berretti

Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer

Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect…

Sound · Computer Science 2024-02-23 Amit Kumar Singh Yadav , Ziyue Xiang , Kratika Bhagtani , Paolo Bestagini , Stefano Tubaro , Edward J. Delp

FairSSD: Understanding Bias in Synthetic Speech Detectors

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Amit Kumar Singh Yadav , Kratika Bhagtani , Davide Salvi , Paolo Bestagini , Edward J. Delp

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition

This paper presents a novel framework for multi-talker automatic speech recognition without the need for auxiliary information. Serialized Output Training (SOT), a widely used approach, suffers from recognition errors due to speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-10 Asahi Sakuma , Hiroaki Sato , Ryuga Sugano , Tadashi Kumano , Yoshihiko Kawai , Tetsuji Ogawa

Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or…

Machine Learning · Computer Science 2021-04-13 Arun Kumar Singh , Priyanka Singh

Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

Speech synthesis methods can create realistic-sounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic…

Sound · Computer Science 2022-10-17 Emily R. Bartusiak , Edward J. Delp

Detection of AI Synthesized Hindi Speech

The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep…

Sound · Computer Science 2022-03-09 Karan Bhatia , Ansh Agrawal , Priyanka Singh , Arun Kumar Singh

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-25 Ting-Yao Hu , Mohammadreza Armandpour , Ashish Shrivastava , Jen-Hao Rick Chang , Hema Koppula , Oncel Tuzel

Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks

Modern text-to-speech (TTS) and voice conversion (VC) systems produce natural sounding speech that questions the security of automatic speaker verification (ASV). This makes detection of such synthetic speech very important to safeguard ASV…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-22 Zhenzong Wu , Rohan Kumar Das , Jichen Yang , Haizhou Li

Transformer Based Machine Fault Detection From Audio Input

In recent years, Sound AI is being increasingly used to predict machine failures. By attaching a microphone to the machine of interest, one can get real time data on machine behavior from the field. Traditionally, Convolutional Neural Net…

Sound · Computer Science 2026-04-15 Kiran Voderhobli Holla

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

The recent developments in technology have re-warded us with amazing audio synthesis models like TACOTRON and WAVENETS. On the other side, it poses greater threats such as speech clones and deep fakes, that may go undetected. To tackle…

Machine Learning · Computer Science 2021-07-27 Arun Kumar Singh , Priyanka Singh , Karan Nathwani

Deepfake audio detection by speaker verification

Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our…

Sound · Computer Science 2022-09-29 Alessandro Pianese , Davide Cozzolino , Giovanni Poggi , Luisa Verdoliva

Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features

With the rapid advancement in synthetic speech generation technologies, great interest in differentiating spoof speech from the natural speech is emerging in the research community. The identification of these synthetic signals is a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-06 Tadipatri Uday Kiran Reddy , Sahukari Chaitanya Varun , Kota Pranav Kumar Sankala Sreekanth , Kodukula Sri Rama Murty

Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection

The rapid spread of media content synthesis technology and the potentially damaging impact of audio and video deepfakes on people's lives have raised the need to implement systems able to detect these forgeries automatically. In this work…

Sound · Computer Science 2022-11-01 Luigi Attorresi , Davide Salvi , Clara Borrelli , Paolo Bestagini , Stefano Tubaro

Controllable Context-aware Conversational Speech Synthesis

In spoken conversations, spontaneous behaviors like filled pause and prolongations always happen. Conversational partner tends to align features of their speech with their interlocutor which is known as entrainment. To produce human-like…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-22 Jian Cong , Shan Yang , Na Hu , Guangzhi Li , Lei Xie , Dan Su

The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection

The recent integration of generative neural strategies and audio processing techniques have fostered the widespread of synthetic speech synthesis or transformation algorithms. This capability proves to be harmful in many legal and…

Sound · Computer Science 2022-10-07 Daniele Mari , Federica Latora , Simone Milani

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an…

Computation and Language · Computer Science 2023-03-03 Philipp Klumpp , Pooja Chitkara , Leda Sarı , Prashant Serai , Jilong Wu , Irina-Elena Veliche , Rongqing Huang , Qing He

Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content

Recent advancements in synthetic speech generation have led to the creation of forged audio data that are almost indistinguishable from real speech. This phenomenon poses a new challenge for the multimedia forensics community, as the misuse…

Sound · Computer Science 2024-02-09 Davide Salvi , Temesgen Semu Balcha , Paolo Bestagini , Stefano Tubaro

Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features

Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to…

Sound · Computer Science 2023-09-28 Sarah Barrington , Romit Barua , Gautham Koorma , Hany Farid