An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams

碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many a...

Full description

Bibliographic Details
Main Authors:	Wei-hau Peng, 彭偉豪
Other Authors:	Ye-In Chang
Format:	Others
Language:	en_US
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/23ya2j

id	ndltd-TW-097NSYS5392015
record_format	oai_dc
spelling	ndltd-TW-097NSYS53920152019-05-29T03:42:52Z http://ndltd.ncl.edu.tw/handle/23ya2j An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams 一個於資料串流中有效率地以集合晶格來探勘封閉頻繁集的方法 Wei-hau Peng 彭偉豪碩士國立中山大學資訊工程學系研究所 97 Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%. Ye-In Chang 張玉盈 2009 學位論文 ; thesis 76 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%.
author2	Ye-In Chang
author_facet	Ye-In Chang Wei-hau Peng 彭偉豪
author	Wei-hau Peng 彭偉豪
spellingShingle	Wei-hau Peng 彭偉豪 An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
author_sort	Wei-hau Peng
title	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_short	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_full	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_fullStr	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_full_unstemmed	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams
title_sort	efficient subset-lattice algorithm for mining closed frequent itemsets in data streams
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/23ya2j
work_keys_str_mv	AT weihaupeng anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT péngwěiháo anefficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT weihaupeng yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ AT péngwěiháo yīgèyúzīliàochuànliúzhōngyǒuxiàolǜdeyǐjíhéjīnggéláitànkānfēngbìpínfánjídefāngfǎ AT weihaupeng efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams AT péngwěiháo efficientsubsetlatticealgorithmforminingclosedfrequentitemsetsindatastreams
_version_	1719192991679119360

An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams

Similar Items