An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (long...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2003
|
Online Access: | http://ndltd.ncl.edu.tw/handle/burzr2 |
id |
ndltd-TW-091FCU05392009 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-091FCU053920092018-06-25T06:06:39Z http://ndltd.ncl.edu.tw/handle/burzr2 An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets 一個以封閉式集合為基礎能夠有效率尋找最大高頻項目組的方法 Chun-Jung Chu 朱俊榮 碩士 逢甲大學 資訊工程所 91 Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm. Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way. Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger. In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis. Don-lin Yang 楊東麟 2003 學位論文 ; thesis 62 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm.
Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way.
Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger.
In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis.
|
author2 |
Don-lin Yang |
author_facet |
Don-lin Yang Chun-Jung Chu 朱俊榮 |
author |
Chun-Jung Chu 朱俊榮 |
spellingShingle |
Chun-Jung Chu 朱俊榮 An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
author_sort |
Chun-Jung Chu |
title |
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
title_short |
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
title_full |
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
title_fullStr |
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
title_full_unstemmed |
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets |
title_sort |
efficient closure-based method for discovering maximal frequent itemsets |
publishDate |
2003 |
url |
http://ndltd.ncl.edu.tw/handle/burzr2 |
work_keys_str_mv |
AT chunjungchu anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT zhūjùnróng anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT chunjungchu yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ AT zhūjùnróng yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ AT chunjungchu efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT zhūjùnróng efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets |
_version_ |
1718706385864097792 |