A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams

碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each ite...

Full description

Bibliographic Details
Main Authors:	Pei-Ying Lin, 林佩穎
Other Authors:	Ye-In Chang
Format:	Others
Language:	en_US
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/22045887069370251404

id	ndltd-TW-099NSYS5392025
record_format	oai_dc
spelling	ndltd-TW-099NSYS53920252015-10-19T04:03:18Z http://ndltd.ncl.edu.tw/handle/22045887069370251404 A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams 一個以集合檢查來從資料串流中搜尋最大頻繁項目集的方法 Pei-Ying Lin 林佩穎碩士國立中山大學資訊工程學系研究所 99 Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm. Ye-In Chang 張玉盈 2011 學位論文 ; thesis 79 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm.
author2	Ye-In Chang
author_facet	Ye-In Chang Pei-Ying Lin 林佩穎
author	Pei-Ying Lin 林佩穎
spellingShingle	Pei-Ying Lin 林佩穎 A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
author_sort	Pei-Ying Lin
title	A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_short	A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_full	A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_fullStr	A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_full_unstemmed	A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_sort	set-checking algorithm for mining maximal frequent itemsets from data streams
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/22045887069370251404
work_keys_str_mv	AT peiyinglin asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT línpèiyǐng asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT peiyinglin yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ AT línpèiyǐng yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ AT peiyinglin setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT línpèiyǐng setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams
_version_	1718094076691087360

A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams

Similar Items