中文

Dictionary based methods for information extraction

统计力学 2009-11-10 v2 其他凝聚态物理 信息检索 基因组学 其他定量生物学

摘要

In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from (e.g. DNA strings). We then describe a procedure of string comparison between dictionary-created sequences (or "artificial texts") that gives very good results in several contexts. We finally present some results on self-consistent classification problems.

引用

@article{arxiv.cond-mat/0402581,
  title  = {Dictionary based methods for information extraction},
  author = {A. Baronchelli and E. Caglioti and V. Loreto and E. Pizzi},
  journal= {arXiv preprint arXiv:cond-mat/0402581},
  year   = {2009}
}

备注

7 pages, Latex, elsart style