Summary: | 碩士 === 國立中正大學 === 資訊工程研究所 === 91 === As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW.
The system used two thesaurus extraction methods. In the first method, we base on the writing practice and extract contents from web page. Then we extract candidates of thesauruses from web contents by some syntactic analysis. We will merge these candidates and reduce the noise of thesauruses and produce a thesaurus dictionary. In the second method, we analyze the anchor text from web and produce web site alias and abbreviation. We also collect data from the web site which one in BIG5 and GB code, and extract the relation of simplified Chinese and standardized Chinese phrase. We can use thesaurus dictionary increase the search result and make the results more precisely.
|