An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many a...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/23ya2j |
id |
ndltd-TW-097NSYS5392015 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NSYS53920152019-05-29T03:42:52Z http://ndltd.ncl.edu.tw/handle/23ya2j An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams 一個於資料串流中有效率地以集合晶格來探勘封閉頻繁集的方法 Wei-hau Peng 彭偉豪 碩士 國立中山大學 資訊工程學系研究所 97 Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%. Ye-In Chang 張玉盈 2009 學位論文 ; thesis 76 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking.
Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a
closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a
sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts.
Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%.
|
author2 |
Ye-In Chang |
author_facet |
Ye-In Chang Wei-hau Peng 彭偉豪 |
author |
Wei-hau Peng 彭偉豪 |
spellingShingle |
Wei-hau Peng 彭偉豪 An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
author_sort |
Wei-hau Peng |
title |
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
title_short |
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
title_full |
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
title_fullStr |
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
title_full_unstemmed |
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams |
title_sort |
efficient subset-lattice algorithm for mining closed frequent itemsets in data streams |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/23ya2j |
work_keys_str_mv |
AT weihaupeng anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT péngwěiháo anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT weihaupeng yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ AT péngwěiháo yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ AT weihaupeng efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT péngwěiháo efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams |
_version_ |
1719192991679119360 |