Related papers: FluentSpeech: Stutter-Oriented Automatic Speech Ed…

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current…

Sound · Computer Science 2023-09-25 Rui Liu , Jiatian Xi , Ziyue Jiang , Haizhou Li

FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency

Text-based speech editing (TSE) allows users to edit speech by modifying the corresponding text directly without altering the original recording. Current TSE techniques often focus on minimizing discrepancies between generated speech and…

Computation and Language · Computer Science 2024-12-10 Rui Liu , Jiatian Xi , Ziyue Jiang , Haizhou Li

FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

Strong presentation skills are valuable and sought-after in workplace and classroom environments alike. Of the possible improvements to vocal presentations, disfluencies and stutters in particular remain one of the most common and prominent…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-25 Tedd Kourkounakis , Amirhossein Hajavi , Ali Etemad

Fluent: An AI Augmented Writing Tool for People who Stutter

Stuttering is a speech disorder which impacts the personal and professional lives of millions of people worldwide. To save themselves from stigma and discrimination, people who stutter (PWS) may adopt different strategies to conceal their…

Artificial Intelligence · Computer Science 2021-08-24 Bhavya Ghai , Klaus Mueller

Self-supervised Speech Models for Word-Level Stuttered Speech Detection

Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders.…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-18 Yi-Jen Shih , Zoi Gkalitsiou , Alexandros G. Dimakis , David Harwath

StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction

Over 70 million people worldwide experience stuttering, yet most automatic speech systems misinterpret disfluent utterances or fail to transcribe them accurately. Existing methods for stutter correction rely on handcrafted feature…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-06 Qianheng Xu

Automatic Disfluency Detection from Untranscribed Speech

Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. Stuttering is a speech disorder characterized by a high rate of disfluencies, but all individuals speak with some disfluencies and the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-03 Amrit Romana , Kazuhito Koishida , Emily Mower Provost

Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment

The automated classification of stuttered speech has significant implications for timely assessments providing assistance to speech language pathologists. Despite notable advancements in the field, the cases in which multiple disfluencies…

Sound · Computer Science 2025-02-27 Huma Ameer , Seemab Latif , Mehwish Fatima

STEAMROLLER: A Multi-Agent System for Inclusive Automatic Speech Recognition for People who Stutter

People who stutter (PWS) face systemic exclusion in today's voice-driven society, where access to voice assistants, authentication systems, and remote work tools increasingly depends on fluent speech. Current automatic speech recognition…

Computers and Society · Computer Science 2026-01-16 Ziqi Xu , Yi Liu , Yuekang Li , Ling Shi , Kailong Wang , Yongxin Zhao

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-27 Sebastian P. Bayerl , Dominik Wagner , Elmar Nöth , Korbinian Riedhammer

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly…

Computation and Language · Computer Science 2022-11-18 Xin Zhang , Iván Vallés-Pérez , Andreas Stolcke , Chengzhu Yu , Jasha Droppo , Olabanji Shonibare , Roberto Barra-Chicote , Venkatesh Ravichandran

StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

Detecting and segmenting dysfluencies is crucial for effective speech therapy and real-time feedback. However, most methods only classify dysfluencies at the utterance level. We introduce StutterCut, a semi-supervised framework that…

Sound · Computer Science 2025-08-05 Suhita Ghosh , Melanie Jouaiti , Jan-Ole Perschewski , Sebastian Stober

AttentionStitch: How Attention Solves the Speech Editing Problem

The generation of natural and high-quality speech from text is a challenging problem in the field of natural language processing. In addition to speech generation, speech editing is also a crucial task, which requires the seamless and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-11 Antonios Alexos , Pierre Baldi

Speech Editing -- a Summary

With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing…

Sound · Computer Science 2024-07-25 Tobias Kässmann , Yining Liu , Danni Liu

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

In recent years, advancements in the field of speech processing have led to cutting-edge deep learning algorithms with immense potential for real-world applications. The automated identification of stuttered speech is one of such…

Sound · Computer Science 2023-11-10 Huma Ameer , Seemab Latif , Rabia Latif , Sana Mukhtar

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

As text-based speech editing becomes increasingly prevalent, the demand for unrestricted free-text editing continues to grow. However, existing speech editing techniques encounter significant challenges, particularly in maintaining…

Sound · Computer Science 2024-09-23 Yang Chen , Yuhang Jia , Shiwan Zhao , Ziyue Jiang , Haoran Li , Jiarong Kang , Yong Qin

LatentSpeech: Latent Diffusion for Text-To-Speech Generation

Diffusion-based Generative AI gains significant attention for its superior performance over other generative techniques like Generative Adversarial Networks and Variational Autoencoders. While it has achieved notable advancements in fields…

Sound · Computer Science 2024-12-12 Haowei Lou , Helen Paik , Pari Delir Haghighi , Wen Hu , Lina Yao

Stuttering-Aware Automatic Speech Recognition for Indonesian Language

Automatic speech recognition systems have achieved remarkable performance on fluent speech but continue to degrade significantly when processing stuttered speech, a limitation that is particularly acute for low-resource languages like…

Computation and Language · Computer Science 2026-01-15 Fadhil Muhammad , Alwin Djuliansah , Adrian Aryaputra Hamzah , Kurniawati Azizah

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-23 Vikramjit Mitra , Zifang Huang , Colin Lea , Lauren Tooley , Sarah Wu , Darren Botten , Ashwini Palekar , Shrinath Thelapurath , Panayiotis Georgiou , Sachin Kajarekar , Jefferey Bigham

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement

With the fast development of zero-shot text-to-speech technologies, it is possible to generate high-quality speech signals that are indistinguishable from the real ones. Speech editing, including speech insertion and replacement, appeals to…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-19 Kuan-Yu Chen , Jeng-Lin Li , De-Yan Lu , Jian-Jiun Ding