A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams

碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each ite...

Full description

Bibliographic Details
Main Authors: Pei-Ying Lin, 林佩穎
Other Authors: Ye-In Chang
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/22045887069370251404
id ndltd-TW-099NSYS5392025
record_format oai_dc
spelling ndltd-TW-099NSYS53920252015-10-19T04:03:18Z http://ndltd.ncl.edu.tw/handle/22045887069370251404 A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams 一個以集合檢查來從資料串流中搜尋最大頻繁項目集的方法 Pei-Ying Lin 林佩穎 碩士 國立中山大學 資訊工程學系研究所 99 Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm. Ye-In Chang 張玉盈 2011 學位論文 ; thesis 79 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊工程學系研究所 === 99 === Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm.
author2 Ye-In Chang
author_facet Ye-In Chang
Pei-Ying Lin
林佩穎
author Pei-Ying Lin
林佩穎
spellingShingle Pei-Ying Lin
林佩穎
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
author_sort Pei-Ying Lin
title A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_short A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_full A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_fullStr A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_full_unstemmed A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams
title_sort set-checking algorithm for mining maximal frequent itemsets from data streams
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/22045887069370251404
work_keys_str_mv AT peiyinglin asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams
AT línpèiyǐng asetcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams
AT peiyinglin yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ
AT línpèiyǐng yīgèyǐjíhéjiǎncháláicóngzīliàochuànliúzhōngsōuxúnzuìdàpínfánxiàngmùjídefāngfǎ
AT peiyinglin setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams
AT línpèiyǐng setcheckingalgorithmforminingmaximalfrequentitemsetsfromdatastreams
_version_ 1718094076691087360