An Efficient Data Structure for Fast Mining High Utility Itemsets
Abstract
In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average.
Keywords
Cite
@article{arxiv.1510.02188,
title = {An Efficient Data Structure for Fast Mining High Utility Itemsets},
author = {Zhi-Hong Deng and Shulei Ma and He Liu},
journal= {arXiv preprint arXiv:1510.02188},
year = {2015}
}
Comments
25 pages,9 figures