Implementataion and Performance Evaluation of Association Rule Mining Algorithms
碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/77869041005439613437 |
id |
ndltd-TW-090NTUST396019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090NTUST3960192015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/77869041005439613437 Implementataion and Performance Evaluation of Association Rule Mining Algorithms 關聯規則演算法之實作和效能評估 Shih-Chun Chiu 邱士軍 碩士 國立臺灣科技大學 資訊管理系 90 Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions. A fair comparison serves as a guide for choosing the right mining algorithm for a given specific condition. Unfortunately, no fair third party has conducted comprehensive comparisons among the association rule mining algorithms. In this thesis, we perform performance comparisons on five well-known algorithms. Among them are Apriori, Boolean, FP-Growth, Maxminer and DIC algorithms. We implemented several versions for each algorithm. Then, we choose the most efficient implementation among our implementations and the implementation provided directly by the original author if one is available. We also describe the details of our implementations. To compare the performance of the algorithms, we use three synthetic transactional databases generated by the IBM dataset generator and the FoodMart database, a real transactional database from SQL Server. The three synthetic databases are T5I2, T10I4 and T20I6. They have different mean transaction length and mean frequent itemsets length. Experiments show that no algorithm prevails in all circumstances. The Apriori algorithm and the DIC algorithm prevail when the minimum support is high and, therefore, less computation time is needed. On the other hand, the Boolean algorithm and the FP-Growth algorithm scale up well in the sense that they prevail under low minimum support. Furthermore, the Boolean algorithm and the FP-Growth algorithm significantly outperform other algorithms when the mean transaction length is long. Besides, we also found that the memory size occupied by the FP-tree is at least as large as the transactional database itself. Yungho Leu 呂永和 2002 學位論文 ; thesis 62 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions. A fair comparison serves as a guide for choosing the right mining algorithm for a given specific condition. Unfortunately, no fair third party has conducted comprehensive comparisons among the association rule mining algorithms. In this thesis, we perform performance comparisons on five well-known algorithms. Among them are Apriori, Boolean, FP-Growth, Maxminer and DIC algorithms. We implemented several versions for each algorithm. Then, we choose the most efficient implementation among our implementations and the implementation provided directly by the original author if one is available. We also describe the details of our implementations. To compare the performance of the algorithms, we use three synthetic transactional databases generated by the IBM dataset generator and the FoodMart database, a real transactional database from SQL Server. The three synthetic databases are T5I2, T10I4 and T20I6. They have different mean transaction length and mean frequent itemsets length.
Experiments show that no algorithm prevails in all circumstances. The Apriori algorithm and the DIC algorithm prevail when the minimum support is high and, therefore, less computation time is needed. On the other hand, the Boolean algorithm and the FP-Growth algorithm scale up well in the sense that they prevail under low minimum support. Furthermore, the Boolean algorithm and the FP-Growth algorithm significantly outperform other algorithms when the mean transaction length is long. Besides, we also found that the memory size occupied by the FP-tree is at least as large as the transactional database itself.
|
author2 |
Yungho Leu |
author_facet |
Yungho Leu Shih-Chun Chiu 邱士軍 |
author |
Shih-Chun Chiu 邱士軍 |
spellingShingle |
Shih-Chun Chiu 邱士軍 Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
author_sort |
Shih-Chun Chiu |
title |
Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
title_short |
Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
title_full |
Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
title_fullStr |
Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
title_full_unstemmed |
Implementataion and Performance Evaluation of Association Rule Mining Algorithms |
title_sort |
implementataion and performance evaluation of association rule mining algorithms |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/77869041005439613437 |
work_keys_str_mv |
AT shihchunchiu implementataionandperformanceevaluationofassociationruleminingalgorithms AT qiūshìjūn implementataionandperformanceevaluationofassociationruleminingalgorithms AT shihchunchiu guānliánguīzéyǎnsuànfǎzhīshízuòhéxiàonéngpínggū AT qiūshìjūn guānliánguīzéyǎnsuànfǎzhīshízuòhéxiàonéngpínggū |
_version_ |
1717756291331391488 |