Thesaurus Extraction From the World Wide Web

碩士 === 國立中正大學 === 資訊工程研究所 === 91 === As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW. The system used two thesaurus extraction methods...

Full description

Bibliographic Details
Main Authors: Yi-Min Shih, 石逸民
Other Authors: Sun Wu
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/06154737758604082281
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 91 === As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW. The system used two thesaurus extraction methods. In the first method, we base on the writing practice and extract contents from web page. Then we extract candidates of thesauruses from web contents by some syntactic analysis. We will merge these candidates and reduce the noise of thesauruses and produce a thesaurus dictionary. In the second method, we analyze the anchor text from web and produce web site alias and abbreviation. We also collect data from the web site which one in BIG5 and GB code, and extract the relation of simplified Chinese and standardized Chinese phrase. We can use thesaurus dictionary increase the search result and make the results more precisely.