Incremental Data Mining Using Pre-large Itemsets

碩士 === 義守大學 === 資訊工程學系 === 88 === Due to the increasing usage of very large databases and data warehouses, mining useful information and helpful knowledge from transactions has been evolving into an important research area. In the past, researchers usually assumed the database was static to simplify...

Full description

Bibliographic Details
Main Authors:	Ching-Yao Wang, 王慶堯
Other Authors:	Tzung-Pei Hong
Format:	Others
Language:	en_US
Published:	2000
Online Access:	http://ndltd.ncl.edu.tw/handle/90595717713161273256

Description
Summary:	碩士 === 義守大學 === 資訊工程學系 === 88 === Due to the increasing usage of very large databases and data warehouses, mining useful information and helpful knowledge from transactions has been evolving into an important research area. In the past, researchers usually assumed the database was static to simplify the data-mining problem. Most of the classic algorithms proposed thus focused on batch mining, and did not utilize previously mined information for incrementally growing databases. In real-word applications, however, developing a mining algorithm that can incrementally maintain the discovered information as a database grows is quite important. In this thesis, we propose the concept of pre-large itemsets and design two novel efficient incremental mining algorithms based on it. Pre-large itemsets are defined using two support thresholds, a lower support threshold and an upper support threshold, to reduce rescanning the original databases and to save maintenance costs. Pre-large itemsets act like a gap, which reduces the movement of an itemset directly from large to small and vice verse. In the proposed first algorithm, the lower support threshold is fixed and the number of new transactions allowed for not rescanning databases dynamically increases as databases grow. Thus, it doesn''t need to rescan the original database until a number of transactions have come. If the size of the database is growing larger, then the allowed number of new transactions will be larger too. In the second algorithm, the number of new transactions allowed for not rescanning databases is fixed, and the lower support threshold is dynamically set close to the upper support threshold as databases grow. Thus, as the size of the database is larger, the additional overhead decreases in maintaining the consistency of association rules with the updated databases. Therefore, along with the growth of a database, our proposed approaches are increasingly efficient. This characteristic is especially useful for real applications.

Incremental Data Mining Using Pre-large Itemsets

Similar Items