A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each ite...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/22045887069370251404 |
id |
ndltd-TW-099NSYS5392025 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NSYS53920252015-10-19T04:03:18Z http://ndltd.ncl.edu.tw/handle/22045887069370251404 A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams 一個以集合檢查來從資料串流中搜尋最大頻繁項目集的方法 Pei-Ying Lin 林佩穎 碩士 國立中山大學 資訊工程學系研究所 99 Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm. Ye-In Chang 張玉盈 2011 學位論文 ; thesis 79 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm.
|
author2 |
Ye-In Chang |
author_facet |
Ye-In Chang Pei-Ying Lin 林佩穎 |
author |
Pei-Ying Lin 林佩穎 |
spellingShingle |
Pei-Ying Lin 林佩穎 A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
author_sort |
Pei-Ying Lin |
title |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
title_short |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
title_full |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
title_fullStr |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
title_full_unstemmed |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams |
title_sort |
set-checking algorithm for mining maximal frequent itemsets from data streams |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/22045887069370251404 |
work_keys_str_mv |
AT peiyinglin asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT línpèiyǐng asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT peiyinglin yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ AT línpèiyǐng yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ AT peiyinglin setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams AT línpèiyǐng setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams |
_version_ |
1718094076691087360 |