English

Consensus Sequence Segmentation

Computation and Language 2013-12-31 v2

Abstract

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input sequence to detect location of word boundaries. We compare our algorithm to previous approaches from unsupervised sequence segmentation literature and provide superior segmentation over number of benchmarks.

Keywords

Cite

@article{arxiv.1308.3839,
  title  = {Consensus Sequence Segmentation},
  author = {Tamal Chowdhury and Rabindra Rakshit and Arko Banerjee},
  journal= {arXiv preprint arXiv:1308.3839},
  year   = {2013}
}

Comments

This paper has been withdrawn by the authors. The paper has been withdrawn due to error data input in table no. 1

R2 v1 2026-06-22T01:10:58.574Z