Summary: | 碩士 === 中原大學 === 資訊工程研究所 === 96 === Data in recent applications over data streams such as network monitoring, stock and financial analysis often continuously and rapidly flow into the system. As the storage space is limited, a proper mechanism for data update and compression is required in order that the important information can be preserved. In the previous representative patterns, RP and δ-TCFI, they are both pick the big size of itemsets to represent the subsets of it under the threshold. This paper combines the concept of representative patterns from static databases and the techniques for pattern update and count estimation over data streams. We propose an algorithm for mining two types of representative patterns. Moreover, we adapt the data structure proposed for mining closed frequent patterns from static databases to batch processing of transactions from data streams. By our mining algorithm, comparing a frequent pattern with the representative patterns discovered so far is efficient. The experiment results show that the two types of representative patterns lead to different performance.
When mining δ-TCFI, we can get well efficiency, precision and recall. When mining RP, we can get lower error rate. Users can set one of them as the target for mining according to their application needs.
|