Efficient Mining of Frequent Itemsets Using Only One Dynamic Prefix Tree

Frequent itemset mining is a fundamental problem in data mining area because frequent itemsets have been extensively used in reasoning, classifying, clustering, and so on. To mine frequent itemsets, previous algorithms based on a prefix tree structure have to construct many prefix trees, which is ve...

Full description

Bibliographic Details
Main Authors: Jun-Feng Qu, Bo Hang, Zhao Wu, Zhongbo Wu, Qiong Gu, Bo Tang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9216179/
Description
Summary:Frequent itemset mining is a fundamental problem in data mining area because frequent itemsets have been extensively used in reasoning, classifying, clustering, and so on. To mine frequent itemsets, previous algorithms based on a prefix tree structure have to construct many prefix trees, which is very time-consuming. In this paper, we propose a novel frequent itemset mining algorithm called DPT (Dynamic Prefix Tree) which uses only one prefix tree. We first introduce the concept of the post-conditional database of an itemset, and analyze the distribution of an itemset's post-conditional database in a prefix tree representing a database. Subsequently, we illuminate how DPT adjusts the prefix tree to mine frequent itemsets and give three optimization techniques. An interesting advantage of DPT is that the algorithm can directly output a prefix tree representing all frequent itemsets after slight modifications. Using only one dynamic prefix tree, DPT avoids the high cost of constructing many prefix trees and thus gains significant performance improvement. Experimental results show that DPT remarkably outperforms previous algorithms with respect to running time and memory usage, and that a prefix tree representing all frequent itemsets DPT outputs can be used more efficient than a list representing them previous algorithms output.
ISSN:2169-3536