Improving Data-Mining Efficiency by Predictive Itemsets

碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly...

Full description

Bibliographic Details
Main Authors: Chyan-Yuan Horng, 洪乾元
Other Authors: Tzung-Pei Hong
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/05870665801916319828
id ndltd-TW-090ISU00392039
record_format oai_dc
spelling ndltd-TW-090ISU003920392015-10-13T17:39:45Z http://ndltd.ncl.edu.tw/handle/05870665801916319828 Improving Data-Mining Efficiency by Predictive Itemsets 利用預測項目集以增進資料挖掘效率 Chyan-Yuan Horng 洪乾元 碩士 義守大學 資訊工程學系 90 Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns. In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly. In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases. Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values. Tzung-Pei Hong Shyue-Liang Wang 洪宗貝 王學亮 2002 學位論文 ; thesis 72 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns. In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly. In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases. Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values.
author2 Tzung-Pei Hong
author_facet Tzung-Pei Hong
Chyan-Yuan Horng
洪乾元
author Chyan-Yuan Horng
洪乾元
spellingShingle Chyan-Yuan Horng
洪乾元
Improving Data-Mining Efficiency by Predictive Itemsets
author_sort Chyan-Yuan Horng
title Improving Data-Mining Efficiency by Predictive Itemsets
title_short Improving Data-Mining Efficiency by Predictive Itemsets
title_full Improving Data-Mining Efficiency by Predictive Itemsets
title_fullStr Improving Data-Mining Efficiency by Predictive Itemsets
title_full_unstemmed Improving Data-Mining Efficiency by Predictive Itemsets
title_sort improving data-mining efficiency by predictive itemsets
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/05870665801916319828
work_keys_str_mv AT chyanyuanhorng improvingdataminingefficiencybypredictiveitemsets
AT hónggānyuán improvingdataminingefficiencybypredictiveitemsets
AT chyanyuanhorng lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ
AT hónggānyuán lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ
_version_ 1717783572710948864