English
Related papers

Related papers: A statistical learning algorithm for word segmenta…

200 papers

A statistical model for segmentation and word discovery in continuous speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described. Results of empirical tests showing that the…

Computation and Language · Computer Science 2007-05-23 Anand Venkataraman

This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text.…

Computation and Language · Computer Science 2007-05-23 Michael R. Brent

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and…

Computation and Language · Computer Science 2018-12-04 Yerai Doval , Carlos Gómez-Rodríguez

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…

Computation and Language · Computer Science 2013-12-31 Tamal Chowdhury , Rabindra Rakshit , Arko Banerjee

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition,…

Computation and Language · Computer Science 2021-09-22 Ramon Sanabria , Hao Tang , Sharon Goldwater

Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a 'space' delimiter between words. Popular Bayesian non-parametric models for text segmentation use a Dirichlet process to jointly segment…

Computation and Language · Computer Science 2022-06-24 Robin Algayres , Tristan Ricoul , Julien Karadayi , Hugo Laurençon , Salah Zaiem , Abdelrahman Mohamed , Benoît Sagot , Emmanuel Dupoux

Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries…

Computation and Language · Computer Science 2023-10-31 Lei Zhang , Zhenghua Li , Shilin Zhou , Chen Gong , Zhefeng Wang , Baoxing Huai , Min Zhang

A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing…

Computation and Language · Computer Science 2007-05-23 Anand Venkataraman

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully}…

Sound · Computer Science 2020-05-08 Zhuo Chen , Takuya Yoshioka , Liang Lu , Tianyan Zhou , Zhong Meng , Yi Luo , Jian Wu , Xiong Xiao , Jinyu Li

This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To…

cmp-lg · Computer Science 2008-02-03 Doug Beeferman , Adam Berger , John Lafferty

Prior methods to text segmentation are mostly at token level. Despite the adequacy, this nature limits their full potential to capture the long-term dependencies among segments. In this work, we propose a novel framework that incrementally…

Computation and Language · Computer Science 2021-04-16 Yangming Li , Lemao Liu , Kaisheng Yao

In this paper, we propose a spoken term detection algorithm for simultaneous prediction and localization of in-vocabulary and out-of-vocabulary terms within an audio segment. The proposed algorithm infers whether a term was uttered within a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-11 Tzeviya Sylvia Fuchs , Yael Segal , Joseph Keshet

Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine…

cmp-lg · Computer Science 2008-02-03 I. Dan Melamed

Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a…

Computation and Language · Computer Science 2013-09-19 Grzegorz Chrupała

In this paper we propose a learning paradigm for the problem of understanding spoken language. The basis of the work is in a formalization of the understanding problem as a communication problem. This results in the definition of a…

cmp-lg · Computer Science 2008-02-03 Roberto Pieraccini , Esther Levin

Language models provide a key framework for studying linguistic theories based on prediction, but phonological analysis using large language models (LLMs) is difficult; there are few phonological benchmarks beyond English and the standard…

Computation and Language · Computer Science 2025-06-13 Zébulon Goriely , Paula Buttery

This project explores the nature of language acquisition in computers, guided by techniques similar to those used in children. While existing natural language processing methods are limited in scope and understanding, our system aims to…

Computation and Language · Computer Science 2012-06-04 Megan Belzner , Sean Colin-Ellerin , Jorge H. Roman

We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning.…

cmp-lg · Computer Science 2008-02-03 Diane J. Litman , Rebecca J. Passonneau

Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging. In this work, we leverage the most recent self-supervised…

Computation and Language · Computer Science 2023-10-10 Robin Algayres , Pablo Diego-Simon , Benoit Sagot , Emmanuel Dupoux

Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of language-impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence…

Computation and Language · Computer Science 2017-08-17 Marcos Vinícius Treviso , Christopher Shulby , Sandra Maria Aluísio
‹ Prev 1 2 3 10 Next ›