Summary: | 碩士 === 國立臺灣科技大學 === 資訊管理系 === 95 === Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem.
In data mining, exploration of association rules is an important research issue. Most association rule researches focus on finding positive association rules. However, many studies point out that negative association rules are as important as positive association rules.
Therefore, in this thesis, we will find out both positive and negative association rules. Although interest measure is a commonly-used measure for text categorization, we find that it is not enough to use interest measure only for text categorization. Some researches use correlation coefficient to judge the strength of a rule, but correlation coefficient only considers absence or presence between terms, not the weight of terms. Besides, it is important to consider term frequencies in categorization. Hence, we would like to combine interest and term weight to enhance the discriminative power of positive and negative association rules. It will be used to filter association rules to make these rules more meaningful and more representative for the classification criterion of a category. Therefore, the categorization results can be improved and new documents can be classified correctly.
|