Improving PPM Algorithm Using Dictionaries

Yichuan Hu; Jianzhong; Zhang; Farooq Khan; Ying Li

Improving PPM Algorithm Using Dictionaries

Information Theory 2015-03-17 v2 math.IT

Authors: Yichuan Hu , Jianzhong , Zhang , Farooq Khan , Ying Li

Abstract

We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over character-based PPM, especially in low order cases.

Keywords

source coding word embeddings natural language parsing

Cite

@article{arxiv.1012.3790,
  title  = {Improving PPM Algorithm Using Dictionaries},
  author = {Yichuan Hu and Jianzhong and Zhang and Farooq Khan and Ying Li},
  journal= {arXiv preprint arXiv:1012.3790},
  year   = {2015}
}

Comments

7 pages, 4 figures, longer version for DCC 2011 paper

Improving PPM Algorithm Using Dictionaries

Abstract

Keywords

Cite

Comments

Related papers