Summary: | 博士 === 淡江大學 === 資訊工程學系博士班 === 96 === Due to rapid developments in information technology and automatic data collection tools, a large amount of data has been collected and stored in various data repositories. To extract valuable information from these data is the key to improve business competition. Data mining offers ways to automatically find nontrivial, previously unknown, and potentially useful knowledge from large databases. Mining of frequent patterns plays an essential role in data mining.
Many methods have been proposed for discovering various types of frequent patterns such as frequent itemsets, association rules, correlation rules, and sequential patterns. In this dissertation, three types of frequent patterns, namely, negative sequential patterns, negative fuzzy sequential patterns, and fuzzy correlation rules, have been introduced.
We propose an algorithm for mining negative sequential patterns, which consider not only the occurrence of itemsets in transactions in databases but also their absence. In this algorithm, we have designed a candidate generation procedure employing the apriori principle to eliminate many redundant candidates during the mining task. Moreover, in this method, we also define a function based on the conditional probability theory to measure the interestingness of sequences in order to find more interesting negative sequential patterns.
Additionally, most transaction data in real-world applications usually consist of quantitative values. In order to investigate various types of data in quantitative databases and then discover negative sequential patterns from such databases, we propose an algorithm, which combines fuzzy-set theory and negative sequential pattern concept, for mining negative fuzzy sequential patterns from quantitative databases.
Furthermore, we propose a method for mining fuzzy correlation rules, which applies fuzzy correlation analysis to determine whether two sub-fuzzy itemsets in a fuzzy itemset are dependent, and then extract more interesting fuzzy correlation rules from quantitative databases.
Experiments in the three proposed algorithms show that our algorithms can prune a lot of redundant candidates during the process of mining tasks and can effectively extract frequent patterns that are actually interesting.
|