We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning).
@article{arxiv.cmp-lg/9505025,
title = {Combining Multiple Knowledge Sources for Discourse Segmentation},
author = {Diane J. Litman and Rebecca J. Passonneau},
journal= {arXiv preprint arXiv:cmp-lg/9505025},
year = {2008}
}
Comments
8 pages. Self-contained latex source. To appear in Proceedings of the 33rd ACL, 1995. (This replacement version revised so that no lines exceed 80 characters.)