An Efficient Data Structure for Fast Mining High Utility Itemsets

Zhi-Hong Deng; Shulei Ma; He Liu

An Efficient Data Structure for Fast Mining High Utility Itemsets

Databases 2015-10-09 v1 Data Structures and Algorithms

Authors: Zhi-Hong Deng , Shulei Ma , He Liu

Abstract

In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average.

Keywords

association rule mining

Cite

@article{arxiv.1510.02188,
  title  = {An Efficient Data Structure for Fast Mining High Utility Itemsets},
  author = {Zhi-Hong Deng and Shulei Ma and He Liu},
  journal= {arXiv preprint arXiv:1510.02188},
  year   = {2015}
}

Comments

25 pages,9 figures

An Efficient Data Structure for Fast Mining High Utility Itemsets

Abstract

Keywords

Cite

Comments

Related papers