Summary: | 碩士 === 義守大學 === 資訊工程學系 === 88 === Due to the increasing usage of very large databases and data warehouses, mining useful information and helpful knowledge from transactions has been evolving into an important research area. In the past, researchers usually assumed the database was static to simplify the data-mining problem. Most of the classic algorithms proposed thus focused on batch mining, and did not utilize previously mined information for incrementally growing databases. In real-word applications, however, developing a mining algorithm that can incrementally maintain the discovered information as a database grows is quite important. In this thesis, we propose the concept of pre-large itemsets and design two novel efficient incremental mining algorithms based on it. Pre-large itemsets are defined using two support thresholds, a lower support threshold and an upper support threshold, to reduce rescanning the original databases and to save maintenance costs. Pre-large itemsets act like a gap, which reduces the movement of an itemset directly from large to small and vice verse.
In the proposed first algorithm, the lower support threshold is fixed and the number of new transactions allowed for not rescanning databases dynamically increases as databases grow. Thus, it doesn''t need to rescan the original database until a number of transactions have come. If the size of the database is growing larger, then the allowed number of new transactions will be larger too. In the second algorithm, the number of new transactions allowed for not rescanning databases is fixed, and the lower support threshold is dynamically set close to the upper support threshold as databases grow. Thus, as the size of the database is larger, the additional overhead decreases in maintaining the consistency of association rules with the updated databases. Therefore, along with the growth of a database, our proposed approaches are increasingly efficient. This characteristic is especially useful for real applications.
|