Summary: | 碩士 === 國立中央大學 === 資訊管理學系 === 106 === With the progress of technology, the importance of "information" has gradually been valued by people. Therefore, researchers in many fields have started to dive into the field of data mining and have developed lots of solutions, looking forward to mine the information behind large database. One of the important methods called Attribute Oriented Induction (short for AOI) has been proposed in 1990.
AOI generalizes each attribute in relational databases according to concept trees ascension for knowledge mining and summarizes the data based on the conceptual tree set by the user's background knowledge. However, there are three main problems about AOI. The first is the threshold of the number of data items to be summarized when summarizing a large amount of data. Therefore, there is no guarantee that each of the summarized results will have a certain degree of certainty. The second problem is that the setting of the traditional AOI threshold will affect the clarity of the induction result. The third problem is that the AOI does not have the function of filtering noise. It will mix information and noise in the data when it is summarized, making the summarized data unclear and blurred.
This study introduces cost to quantify the losing details when attribute generalizing and in a cost-constrained manner to make each generalized tuple with certain degree of certainty. According to the different data selection method (Minimum cost, Random), we proposed two algorithms based on the aggregate hierarchical clustering method. Finally, we find the performance of one of our method superior than traditional AOI and provide more useful information.
|