Implementataion and Performance Evaluation of Association Rule Mining Algorithms

碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions....

Full description

Bibliographic Details
Main Authors: Shih-Chun Chiu, 邱士軍
Other Authors: Yungho Leu
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/77869041005439613437
id ndltd-TW-090NTUST396019
record_format oai_dc
spelling ndltd-TW-090NTUST3960192015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/77869041005439613437 Implementataion and Performance Evaluation of Association Rule Mining Algorithms 關聯規則演算法之實作和效能評估 Shih-Chun Chiu 邱士軍 碩士 國立臺灣科技大學 資訊管理系 90 Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions. A fair comparison serves as a guide for choosing the right mining algorithm for a given specific condition. Unfortunately, no fair third party has conducted comprehensive comparisons among the association rule mining algorithms. In this thesis, we perform performance comparisons on five well-known algorithms. Among them are Apriori, Boolean, FP-Growth, Maxminer and DIC algorithms. We implemented several versions for each algorithm. Then, we choose the most efficient implementation among our implementations and the implementation provided directly by the original author if one is available. We also describe the details of our implementations. To compare the performance of the algorithms, we use three synthetic transactional databases generated by the IBM dataset generator and the FoodMart database, a real transactional database from SQL Server. The three synthetic databases are T5I2, T10I4 and T20I6. They have different mean transaction length and mean frequent itemsets length. Experiments show that no algorithm prevails in all circumstances. The Apriori algorithm and the DIC algorithm prevail when the minimum support is high and, therefore, less computation time is needed. On the other hand, the Boolean algorithm and the FP-Growth algorithm scale up well in the sense that they prevail under low minimum support. Furthermore, the Boolean algorithm and the FP-Growth algorithm significantly outperform other algorithms when the mean transaction length is long. Besides, we also found that the memory size occupied by the FP-tree is at least as large as the transactional database itself. Yungho Leu 呂永和 2002 學位論文 ; thesis 62 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions. A fair comparison serves as a guide for choosing the right mining algorithm for a given specific condition. Unfortunately, no fair third party has conducted comprehensive comparisons among the association rule mining algorithms. In this thesis, we perform performance comparisons on five well-known algorithms. Among them are Apriori, Boolean, FP-Growth, Maxminer and DIC algorithms. We implemented several versions for each algorithm. Then, we choose the most efficient implementation among our implementations and the implementation provided directly by the original author if one is available. We also describe the details of our implementations. To compare the performance of the algorithms, we use three synthetic transactional databases generated by the IBM dataset generator and the FoodMart database, a real transactional database from SQL Server. The three synthetic databases are T5I2, T10I4 and T20I6. They have different mean transaction length and mean frequent itemsets length. Experiments show that no algorithm prevails in all circumstances. The Apriori algorithm and the DIC algorithm prevail when the minimum support is high and, therefore, less computation time is needed. On the other hand, the Boolean algorithm and the FP-Growth algorithm scale up well in the sense that they prevail under low minimum support. Furthermore, the Boolean algorithm and the FP-Growth algorithm significantly outperform other algorithms when the mean transaction length is long. Besides, we also found that the memory size occupied by the FP-tree is at least as large as the transactional database itself.
author2 Yungho Leu
author_facet Yungho Leu
Shih-Chun Chiu
邱士軍
author Shih-Chun Chiu
邱士軍
spellingShingle Shih-Chun Chiu
邱士軍
Implementataion and Performance Evaluation of Association Rule Mining Algorithms
author_sort Shih-Chun Chiu
title Implementataion and Performance Evaluation of Association Rule Mining Algorithms
title_short Implementataion and Performance Evaluation of Association Rule Mining Algorithms
title_full Implementataion and Performance Evaluation of Association Rule Mining Algorithms
title_fullStr Implementataion and Performance Evaluation of Association Rule Mining Algorithms
title_full_unstemmed Implementataion and Performance Evaluation of Association Rule Mining Algorithms
title_sort implementataion and performance evaluation of association rule mining algorithms
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/77869041005439613437
work_keys_str_mv AT shihchunchiu implementataionandperformanceevaluationofassociationruleminingalgorithms
AT qiūshìjūn implementataionandperformanceevaluationofassociationruleminingalgorithms
AT shihchunchiu guānliánguīzéyǎnsuànfǎzhīshízuòhéxiàonéngpínggū
AT qiūshìjūn guānliánguīzéyǎnsuànfǎzhīshízuòhéxiàonéngpínggū
_version_ 1717756291331391488