An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams

碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many a...

Full description

Bibliographic Details
Main Authors: Wei-hau Peng, 彭偉豪
Other Authors: Ye-In Chang
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/23ya2j
id ndltd-TW-097NSYS5392015
record_format oai_dc
spelling ndltd-TW-097NSYS53920152019-05-29T03:42:52Z http://ndltd.ncl.edu.tw/handle/23ya2j An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams 一個於資料串流中有效率地以集合晶格來探勘封閉頻繁集的方法 Wei-hau Peng 彭偉豪 碩士 國立中山大學 資訊工程學系研究所 97 Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%. Ye-In Chang 張玉盈 2009 學位論文 ; thesis 76 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%.
author2 Ye-In Chang
author_facet Ye-In Chang
Wei-hau Peng
彭偉豪
author Wei-hau Peng
彭偉豪
spellingShingle Wei-hau Peng
彭偉豪
An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
author_sort Wei-hau Peng
title An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_short An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_full An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_fullStr An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_full_unstemmed An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_sort efficient subset-lattice algorithm for mining closed frequent itemsets in data streams
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/23ya2j
work_keys_str_mv AT weihaupeng anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams
AT péngwěiháo anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams
AT weihaupeng yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ
AT péngwěiháo yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ
AT weihaupeng efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams
AT péngwěiháo efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams
_version_ 1719192991679119360