Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge da...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/25629406614318367928 |
id |
ndltd-TW-097NTUS5392054 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NTUS53920542016-05-02T04:11:39Z http://ndltd.ncl.edu.tw/handle/25629406614318367928 Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment 以不需重新存取資料庫的方式來有效探勘動態資料庫中的頻繁項目集 Pai-Yu Lin 林柏佑 碩士 國立臺灣科技大學 資訊工程系 97 In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge data. So, the technology of data mining is growing at rapid pace recently. Many helpful algorithms and applications are proposed in the recent years. Moreover, Researchers still try to develop efficient algorithms in this moment. Frequent pattern mining plays an important role in the data mining community since it is usually a fundamental step in various mining tasks. However, maintenance of frequent patterns is very expensive in the incremental database. In addition, the status of a pattern is changed with time. In other words, a frequent pattern is possible to become infrequent, and vice versa. In order to exactly find all frequent patterns, most algorithms have to scan the original database completely whenever an update occurs. In this work, we propose two new algorithms, iTM and ECEM. They mine frequent itemsets without rescanning the whole database in the incremental environment. These algorithms use the compressed structure, and quickly project the transaction dataset into this structure. We are able to preserve frequencies of all items, because our structure has a good compression ratio. Furthermore, these algorithms do not need rescanning the database when the user-defined threshold is changed. We also design several experiments to verify performances of our algorithms. Various transaction databases are used in our experiments. The results demonstrate that our algorithm can extract exact frequent itemsets from the transaction database, and these operations do not spend a lot of cost. In huge databases, we can obtain similar results, either. In this study, our algorithms reduce the cost in the step of scanning, and guarantee that the response time is acceptable. Bi-Ru Dai 戴碧如 2009 學位論文 ; thesis 61 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge data. So, the technology of data mining is growing at rapid pace recently. Many helpful algorithms and applications are proposed in the recent years. Moreover, Researchers still try to develop efficient algorithms in this moment.
Frequent pattern mining plays an important role in the data mining community since it is usually a fundamental step in various mining tasks. However, maintenance of frequent patterns is very expensive in the incremental database. In addition, the status of a pattern is changed with time. In other words, a frequent pattern is possible to become infrequent, and vice versa. In order to exactly find all frequent patterns, most algorithms have to scan the original database completely whenever an update occurs.
In this work, we propose two new algorithms, iTM and ECEM. They mine frequent itemsets without rescanning the whole database in the incremental environment. These algorithms use the compressed structure, and quickly project the transaction dataset into this structure. We are able to preserve frequencies of all items, because our structure has a good compression ratio. Furthermore, these algorithms do not need rescanning the database when the user-defined threshold is changed.
We also design several experiments to verify performances of our algorithms. Various transaction databases are used in our experiments. The results demonstrate that our algorithm can extract exact frequent itemsets from the transaction database, and these operations do not spend a lot of cost. In huge databases, we can obtain similar results, either. In this study, our algorithms reduce the cost in the step of scanning, and guarantee that the response time is acceptable.
|
author2 |
Bi-Ru Dai |
author_facet |
Bi-Ru Dai Pai-Yu Lin 林柏佑 |
author |
Pai-Yu Lin 林柏佑 |
spellingShingle |
Pai-Yu Lin 林柏佑 Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
author_sort |
Pai-Yu Lin |
title |
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
title_short |
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
title_full |
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
title_fullStr |
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
title_full_unstemmed |
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment |
title_sort |
updating frequent itemsets without rescanning the original database in the incremental environment |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/25629406614318367928 |
work_keys_str_mv |
AT paiyulin updatingfrequentitemsetswithoutrescanningtheoriginaldatabaseintheincrementalenvironment AT línbǎiyòu updatingfrequentitemsetswithoutrescanningtheoriginaldatabaseintheincrementalenvironment AT paiyulin yǐbùxūzhòngxīncúnqǔzīliàokùdefāngshìláiyǒuxiàotànkāndòngtàizīliàokùzhōngdepínfánxiàngmùjí AT línbǎiyòu yǐbùxūzhòngxīncúnqǔzīliàokùdefāngshìláiyǒuxiàotànkāndòngtàizīliàokùzhōngdepínfánxiàngmùjí |
_version_ |
1718254064764977152 |