Summary: | 碩士 === 淡江大學 === 資訊工程學系 === 86 === In the era of information explosion, how to learn knowledge
from a large amount of data is an important work.
Consequently, rule induction becomes an important research
topic. Michaski''s AQ (1969) and Quilan''s ID3 (1983) did a good
job in the aspect. Since then many researchers have been
improving them in many ways. We could classify them into two
major categories: ID3 family algorithms and rule-based
algorithms. The major difference between them is that ID3
family algorithms are decision-tree-oriented and rule-induced
algorithms are rule-oriented. The major purpose of this
thesis is to extract rules and to investigate rule combination
and simplification. This paper adopts ID3 family algorithm and
uses it to extract rules directly. The proposed algorithm
divides the data into two parts: training data and testing data.
We make use of "window" which is composed of fixed memory to
handle data, and extract induced rules from it. First, the
training data are put into the window and rules are induced.
Second, each testing data is checked by the induced rules and
induce more rules for all data. Finally, using the proposed
combination algorithm to simplify all induced rules to get the
final simplified rules. There are three advantages of the
proposed method: the first advantage is space saving since all
data are indexed for accessing. So, it isn''t necessary to put
all data into memory at the same time. We just put the needed
data into memory and the rest in disk. The second advantage is
time saving. It can save a lot of time when inducing rules,
because decision tree construction takes more time than table
construction and the number of attributes checked in latter case
is fewer than the number of attributes and checked in the
former case. The third advantage is rule simplification. Using
unseen examples and original data to induce new rules, we can
get more simplied rules as knowledge which people can learn
from these experimental data.
|