Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Richard Nock; Pascal Vaillant; Frank Nielsen; Claudia Henry

Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Computation and Language 2008-10-08 v1 Information Retrieval

Authors: Richard Nock , Pascal Vaillant , Frank Nielsen , Claudia Henry

Abstract

Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering -- a powerful unsupervised classification toolbox -- that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clustering by a soft, probabilistic assignment, which also presents the advantage to bypass a well-known complexity bottleneck of the method. Furthermore, our approach relies on a novel, convenient construction of a Markov chain out of a corpus. Extensive experiments with a readily available system clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.

Keywords

natural language processing natural language parsing cluster analysis

Cite

@article{arxiv.0810.1261,
  title  = {Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm},
  author = {Richard Nock and Pascal Vaillant and Frank Nielsen and Claudia Henry},
  journal= {arXiv preprint arXiv:0810.1261},
  year   = {2008}
}

Comments

6 pages, 7 embedded figures, LaTeX 2e using the ecai2006.cls document class and the algorithm2e.sty style file (+ standard packages like epsfig, amsmath, amssymb, amsfonts...). Extends the short version contained in the ECAI 2006 proceedings

Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Abstract

Keywords

Cite

Comments

Related papers