Improving Data-Mining Efficiency by Predictive Itemsets
碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/05870665801916319828 |
id |
ndltd-TW-090ISU00392039 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090ISU003920392015-10-13T17:39:45Z http://ndltd.ncl.edu.tw/handle/05870665801916319828 Improving Data-Mining Efficiency by Predictive Itemsets 利用預測項目集以增進資料挖掘效率 Chyan-Yuan Horng 洪乾元 碩士 義守大學 資訊工程學系 90 Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns. In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly. In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases. Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values. Tzung-Pei Hong Shyue-Liang Wang 洪宗貝 王學亮 2002 學位論文 ; thesis 72 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns.
In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly.
In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases.
Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values.
|
author2 |
Tzung-Pei Hong |
author_facet |
Tzung-Pei Hong Chyan-Yuan Horng 洪乾元 |
author |
Chyan-Yuan Horng 洪乾元 |
spellingShingle |
Chyan-Yuan Horng 洪乾元 Improving Data-Mining Efficiency by Predictive Itemsets |
author_sort |
Chyan-Yuan Horng |
title |
Improving Data-Mining Efficiency by Predictive Itemsets |
title_short |
Improving Data-Mining Efficiency by Predictive Itemsets |
title_full |
Improving Data-Mining Efficiency by Predictive Itemsets |
title_fullStr |
Improving Data-Mining Efficiency by Predictive Itemsets |
title_full_unstemmed |
Improving Data-Mining Efficiency by Predictive Itemsets |
title_sort |
improving data-mining efficiency by predictive itemsets |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/05870665801916319828 |
work_keys_str_mv |
AT chyanyuanhorng improvingdataminingefficiencybypredictiveitemsets AT hónggānyuán improvingdataminingefficiencybypredictiveitemsets AT chyanyuanhorng lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ AT hónggānyuán lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ |
_version_ |
1717783572710948864 |