Summary: | 碩士 === 淡江大學 === 資訊工程學系碩士班 === 104 === Rapid and vigorous development of information network technology has resulted in the largest data repository.
Collecting relevant information in such a large body of data is rather difficult for any user.
This paper is aimed to help users to grasp key information in a short period of time.
We observe that term frequency in a article can be used as keyword for that article.
Article theme can be easily grasped based on these keywords.
Therefore, users can find the information they want through keyword and significantly reduce unnecessary search time.
Proper word segmentation enables article theme extraction.
And article classification can be achieved by theme differentiation.
We use 320 articles in the theme classification experiment. These articles are divided into two categories: training and testing.
There are 285 training samples,
all belonging to the sports news theme.
There are 15 testing samples that are consists of themes picked at random.
The result is able to pick out 6 articles which belonging to sport news theme among the 15 testing samples.
Among the 20 negative samples, there are 4 false positives, all due to names related to sports events.
|