English

Representing Text Chunks

Computation and Language 2007-05-23 v1

Abstract

Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance. However, equipped with the most suitable data representation, our memory-based learning chunker was able to improve the best published chunking results for a standard data set.

Keywords

Cite

@article{arxiv.cs/9907006,
  title  = {Representing Text Chunks},
  author = {Erik F. Tjong Kim Sang and Jorn Veenstra},
  journal= {arXiv preprint arXiv:cs/9907006},
  year   = {2007}
}

Comments

7 pages