Related papers: Phoneme Boundary Detection using Learnable Segment…

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Felix Kreuk , Joseph Keshet , Yossi Adi

Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we…

Sound · Computer Science 2022-12-14 Hyeongju Kim , Hyeong-Seok Choi

Phoneme Segmentation Using Self-Supervised Speech Models

We apply transfer learning to the task of phoneme segmentation and demonstrate the utility of representations learned in self-supervised pre-training for the task. Our model extends transformer-style encoders with strategically placed…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-04 Luke Strgar , David Harwath

Back to Supervision: Boosting Word Boundary Detection through Frame Classification

Speech segmentation at both word and phoneme levels is crucial for various speech processing tasks. It significantly aids in extracting meaningful units from an utterance, thus enabling the generation of discrete elements. In this work we…

Machine Learning · Computer Science 2024-11-18 Simone Carnemolla , Salvatore Calcagno , Simone Palazzo , Daniela Giordano

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Blind phoneme segmentation with temporal prediction errors

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in…

Computation and Language · Computer Science 2017-05-30 Paul Michel , Okko Räsänen , Roland Thiollière , Emmanuel Dupoux

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Typically, unsupervised segmentation of speech into the phone and word-like units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-12 Saurabhchand Bhati , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velazquez , Najim Dehak

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching

We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping…

Audio and Speech Processing · Electrical Eng. & Systems 2018-12-27 Chih-Kuan Yeh , Jianshu Chen , Chengzhu Yu , Dong Yu

BabyLM's First Words: Word Segmentation as a Phonological Probing Task

Language models provide a key framework for studying linguistic theories based on prediction, but phonological analysis using large language models (LLMs) is difficult; there are few phonological benchmarks beyond English and the standard…

Computation and Language · Computer Science 2025-06-13 Zébulon Goriely , Paula Buttery

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and…

Computation and Language · Computer Science 2021-04-21 Wei Zhou , Simon Berger , Ralf Schlüter , Hermann Ney

Unsupervised Spoken Term Discovery on Untranscribed Speech

(Part of the abstract) In this thesis, we investigate the use of unsupervised spoken term discovery in tackling this problem. Unsupervised spoken term discovery aims to discover topic-related terminologies in a speech without knowing the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-01 Man-Ling Sung

Sequence Prediction with Neural Segmental Models

Segments that span contiguous parts of inputs, such as phonemes in speech, named entities in sentences, actions in videos, occur frequently in sequence prediction problems. Segmental models, a class of models that explicitly hypothesizes…

Computation and Language · Computer Science 2018-06-14 Hao Tang

Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech

The language acquisition literature shows that children do not build their lexicon by segmenting the spoken input into phonemes and then building up words from them, but rather adopt a top-down approach and start by segmenting word-like…

Computation and Language · Computer Science 2020-10-21 William N. Havard , Jean-Pierre Chevrot , Laurent Besacier

Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts

This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in…

Computation and Language · Computer Science 2017-08-17 Marcos V. Treviso , Christopher D. Shulby , Sandra M. Aluisio

Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-22 Xuanru Zhou , Jiachen Lian , Cheol Jun Cho , Tejas Prabhune , Shuhe Li , William Li , Rodrigo Ortiz , Zoe Ezzes , Jet Vonk , Brittany Morin , Rian Bogley , Lisa Wauters , Zachary Miller , Maria Gorno-Tempini , Gopala Anumanchipalli

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Recent advances in machine learning and the availability of articulatory datasets allow vocal tract synthesis to be conditioned on phonetic sequences, a primary task of articulatory speech synthesis. However, quality assessment needs a…

Computation and Language · Computer Science 2026-05-21 Vinicius Ribeiro , Yves Laprie

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality,…

Sound · Computer Science 2023-02-17 Muqiao Yang , Joseph Konan , David Bick , Yunyang Zeng , Shuo Han , Anurag Kumar , Shinji Watanabe , Bhiksha Raj

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-18 Liang-Hsuan Tseng , En-Pei Hu , Cheng-Han Chiang , Yuan Tseng , Hung-yi Lee , Lin-shan Lee , Shao-Hua Sun

Learning to Discover, Ground and Use Words with Segmental Neural Language Models

We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation…

Computation and Language · Computer Science 2019-06-19 Kazuya Kawakami , Chris Dyer , Phil Blunsom

Metric Learning for Phoneme Perception

Metric functions for phoneme perception capture the similarity structure among phonemes in a given language and therefore play a central role in phonology and psycho-linguistics. Various phenomena depend on phoneme similarity, such as…

Machine Learning · Computer Science 2018-09-24 Yair Lakretz , Gal Chechik , Evan-Gary Cohen , Alessandro Treves , Naama Friedmann