English

Optimal Multi-Paragraph Text Segmentation by Dynamic Programming

Computation and Language 2007-05-23 v1

Abstract

There exist several methods of calculating a similarity curve, or a sequence of similarity values, representing the lexical cohesion of successive text constituents, e.g., paragraphs. Methods for deciding the locations of fragment boundaries are, however, scarce. We propose a fragmentation method based on dynamic programming. The method is theoretically sound and guaranteed to provide an optimal splitting on the basis of a similarity curve, a preferred fragment length, and a cost function defined. The method is especially useful when control on fragment size is of importance.

Keywords

Cite

@article{arxiv.cs/9812005,
  title  = {Optimal Multi-Paragraph Text Segmentation by Dynamic Programming},
  author = {Oskari Heinonen},
  journal= {arXiv preprint arXiv:cs/9812005},
  year   = {2007}
}

Comments

5 pages, 3 eps figures, LaTeX2e; includes errata; uses colacl, epsf, times