An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets

碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (long...

Full description

Bibliographic Details
Main Authors: Chun-Jung Chu, 朱俊榮
Other Authors: Don-lin Yang
Format: Others
Language:en_US
Published: 2003
Online Access:http://ndltd.ncl.edu.tw/handle/burzr2
id ndltd-TW-091FCU05392009
record_format oai_dc
spelling ndltd-TW-091FCU053920092018-06-25T06:06:39Z http://ndltd.ncl.edu.tw/handle/burzr2 An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets 一個以封閉式集合為基礎能夠有效率尋找最大高頻項目組的方法 Chun-Jung Chu 朱俊榮 碩士 逢甲大學 資訊工程所 91 Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm. Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way. Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger. In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis. Don-lin Yang 楊東麟 2003 學位論文 ; thesis 62 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm. Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way. Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger. In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis.
author2 Don-lin Yang
author_facet Don-lin Yang
Chun-Jung Chu
朱俊榮
author Chun-Jung Chu
朱俊榮
spellingShingle Chun-Jung Chu
朱俊榮
An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
author_sort Chun-Jung Chu
title An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_short An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_full An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_fullStr An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_full_unstemmed An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_sort efficient closure-based method for discovering maximal frequent itemsets
publishDate 2003
url http://ndltd.ncl.edu.tw/handle/burzr2
work_keys_str_mv AT chunjungchu anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets
AT zhūjùnróng anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets
AT chunjungchu yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ
AT zhūjùnróng yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ
AT chunjungchu efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets
AT zhūjùnróng efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets
_version_ 1718706385864097792