A Study of Rule-Extraction Method for Classification Problem

碩士 === 淡江大學 === 資訊工程學系 === 86 === In the era of information explosion, how to learn knowledge from a large amount of data is an important work. Consequently, rule induction becomes an important research topic. Michaski''s AQ (1969) and Quila...

Full description

Bibliographic Details
Main Authors: Yeh, Yao-Hua, 葉燿華
Other Authors: Horng Wen-Bing
Format: Others
Language:zh-TW
Published: 1998
Online Access:http://ndltd.ncl.edu.tw/handle/91269599622970791418
Description
Summary:碩士 === 淡江大學 === 資訊工程學系 === 86 === In the era of information explosion, how to learn knowledge from a large amount of data is an important work. Consequently, rule induction becomes an important research topic. Michaski''s AQ (1969) and Quilan''s ID3 (1983) did a good job in the aspect. Since then many researchers have been improving them in many ways. We could classify them into two major categories: ID3 family algorithms and rule-based algorithms. The major difference between them is that ID3 family algorithms are decision-tree-oriented and rule-induced algorithms are rule-oriented. The major purpose of this thesis is to extract rules and to investigate rule combination and simplification. This paper adopts ID3 family algorithm and uses it to extract rules directly. The proposed algorithm divides the data into two parts: training data and testing data. We make use of "window" which is composed of fixed memory to handle data, and extract induced rules from it. First, the training data are put into the window and rules are induced. Second, each testing data is checked by the induced rules and induce more rules for all data. Finally, using the proposed combination algorithm to simplify all induced rules to get the final simplified rules. There are three advantages of the proposed method: the first advantage is space saving since all data are indexed for accessing. So, it isn''t necessary to put all data into memory at the same time. We just put the needed data into memory and the rest in disk. The second advantage is time saving. It can save a lot of time when inducing rules, because decision tree construction takes more time than table construction and the number of attributes checked in latter case is fewer than the number of attributes and checked in the former case. The third advantage is rule simplification. Using unseen examples and original data to induce new rules, we can get more simplied rules as knowledge which people can learn from these experimental data.