An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets

碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (long...

Full description

Bibliographic Details
Main Authors:	Chun-Jung Chu, 朱俊榮
Other Authors:	Don-lin Yang
Format:	Others
Language:	en_US
Published:	2003
Online Access:	http://ndltd.ncl.edu.tw/handle/burzr2

id	ndltd-TW-091FCU05392009
record_format	oai_dc
spelling	ndltd-TW-091FCU053920092018-06-25T06:06:39Z http://ndltd.ncl.edu.tw/handle/burzr2 An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets 一個以封閉式集合為基礎能夠有效率尋找最大高頻項目組的方法 Chun-Jung Chu 朱俊榮碩士逢甲大學資訊工程所 91 Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm. Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way. Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger. In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis. Don-lin Yang 楊東麟 2003 學位論文 ; thesis 62 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 逢甲大學 === 資訊工程所 === 91 === Regarding existing algorithms for mining association rules, there are still many problems associated with database scan. Some of those algorithms (e.g., Apriori) may require too many times of database scan, especially when maximum frequent itemsets are too long (longer than 12). Other algorithms (e.g., Pincer-Search) may perform well when maximum frequent itemsets are long, but perform very poorly when maximum frequent itemsets are medium (about 8-12). So we propose CMFI (Close Maximum Frequent Itemset) search algorithm to solve these problems. Our method combines the concepts of Close algorithm and Pincer-Search algorithm. Our CMFI algorithm suits not only for longer maximum frequent itemsets (beyond 12), but also for maximum frequent itemsets are short (below 8). CMFI performs very well in our experiments, especially when maximum frequent itemsets are medium or long in databases. In most cases, maximum itemsets are not frequent itemsets. Maximum frequent itemsets are usually shorter than the maximum itemset in databases. So, when we use algorithms that only suit for long maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or short. On the other hand, when we use algorithms that only suit for short maximum frequent itemsets, they perform very poorly for maximum frequent itemsets that are medium or long. CMFI algorithm could solve these problems in an effective way. Our experiments show the results of comparing our approach to Close and Pincer-Search methods. CMFI approach is 20% to 60% better than that of Pincer-Search when the length of frequent itemsets between is medium. It is also 60% to 80% better than that of Close when maximal closed frequent itemsets are long. Our approach gets even better when the size of databases becomes larger. In practice, most of maximal candidate itemsets in databases are not frequent. For example, from super market transactions we could found that most of the length of frequent itemsets is medium. In many cases of market basket analysis, the length of the maximum frequent itemsets is half of the longest itemsets. Since our CMFI algorithm is very effective for these types of mining maximal frequent itemsets, it is very useful in the applications of market basket analysis.
author2	Don-lin Yang
author_facet	Don-lin Yang Chun-Jung Chu 朱俊榮
author	Chun-Jung Chu 朱俊榮
spellingShingle	Chun-Jung Chu 朱俊榮 An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
author_sort	Chun-Jung Chu
title	An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_short	An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_full	An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_fullStr	An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_full_unstemmed	An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets
title_sort	efficient closure-based method for discovering maximal frequent itemsets
publishDate	2003
url	http://ndltd.ncl.edu.tw/handle/burzr2
work_keys_str_mv	AT chunjungchu anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT zhūjùnróng anefficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT chunjungchu yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ AT zhūjùnróng yīgèyǐfēngbìshìjíhéwèijīchǔnénggòuyǒuxiàolǜxúnzhǎozuìdàgāopínxiàngmùzǔdefāngfǎ AT chunjungchu efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets AT zhūjùnróng efficientclosurebasedmethodfordiscoveringmaximalfrequentitemsets
_version_	1718706385864097792

An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets

Similar Items