Improving Data-Mining Efficiency by Predictive Itemsets

碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly...

Full description

Bibliographic Details
Main Authors:	Chyan-Yuan Horng, 洪乾元
Other Authors:	Tzung-Pei Hong
Format:	Others
Language:	en_US
Published:	2002
Online Access:	http://ndltd.ncl.edu.tw/handle/05870665801916319828

id	ndltd-TW-090ISU00392039
record_format	oai_dc
spelling	ndltd-TW-090ISU003920392015-10-13T17:39:45Z http://ndltd.ncl.edu.tw/handle/05870665801916319828 Improving Data-Mining Efficiency by Predictive Itemsets 利用預測項目集以增進資料挖掘效率 Chyan-Yuan Horng 洪乾元碩士義守大學資訊工程學系 90 Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns. In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly. In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases. Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values. Tzung-Pei Hong Shyue-Liang Wang 洪宗貝王學亮 2002 學位論文 ; thesis 72 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 義守大學 === 資訊工程學系 === 90 === Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Among the techniques proposed, finding association rules or sequential patterns from transaction databases is most commonly seen in data mining. In the past, many algorithms for mining association rules or sequential patterns from transactions were proposed, most of which were executed in level-wise processes. In this paper, we propose novel mining algorithms to improve the efficiency of finding large itemsets or sequential patterns. In the first part of this thesis, we propose a novel mining algorithm to improve the efficiency of finding large itemsets for association rules. The proposed algorithm bases on Denwattana and Getta’ of prediction concept and considers the data dependency in the given transactions. It aims at efficiently finding any p levels of large itemsets by scanning a database twice except for the first level. A new reasonable estimation method is proposed to predict promising and non-promising candidate itemsets flexibly. In addition to mining association rules, mining sequential patterns are also very important to real applications. It is even more difficult than mining from association rules. In the second part of this thesis, we thus try to extend our first approach to efficiently tackle the problem of mining sequential patterns. The proposed approach can be roughly divided into two parts. In the first part, any p levels of large itemsets are found by scanning a database twice. The large itemsets are then used in the second part as the large 1-sequences. Then any p levels of large sequences are found by further scanning the database twice. It is thus expected to provide a flexible and efficient way to finding sequential patterns from large databases. Experimental results show that the proposed approach for finding association rules has a better efficiency than the apriori algorithm when the minimum support value is not set at a large value. This is because when the minimum support values are quite large, the numbers of large itemsets will become very small. The time saved due to the pruning of candidate itemsets in the proposed algorithm will not cover the additional overhead. The proposed algorithm is thus suitable for low or middle minimum support values.
author2	Tzung-Pei Hong
author_facet	Tzung-Pei Hong Chyan-Yuan Horng 洪乾元
author	Chyan-Yuan Horng 洪乾元
spellingShingle	Chyan-Yuan Horng 洪乾元 Improving Data-Mining Efficiency by Predictive Itemsets
author_sort	Chyan-Yuan Horng
title	Improving Data-Mining Efficiency by Predictive Itemsets
title_short	Improving Data-Mining Efficiency by Predictive Itemsets
title_full	Improving Data-Mining Efficiency by Predictive Itemsets
title_fullStr	Improving Data-Mining Efficiency by Predictive Itemsets
title_full_unstemmed	Improving Data-Mining Efficiency by Predictive Itemsets
title_sort	improving data-mining efficiency by predictive itemsets
publishDate	2002
url	http://ndltd.ncl.edu.tw/handle/05870665801916319828
work_keys_str_mv	AT chyanyuanhorng improvingdataminingefficiencybypredictiveitemsets AT hónggānyuán improvingdataminingefficiencybypredictiveitemsets AT chyanyuanhorng lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ AT hónggānyuán lìyòngyùcèxiàngmùjíyǐzēngjìnzīliàowājuéxiàolǜ
_version_	1717783572710948864

Improving Data-Mining Efficiency by Predictive Itemsets

Similar Items