A Way to Extract Thesaurus Without using a Dictionary

碩士 === 建國科技大學 === 機電光系統研究所 === 92 === The Web is becoming the largest data repository in the world. How to discover knowledge in diverse data resources on the Web which is benefiting Web-based information systems is being studied in the research area of Web mining. Multilingual terminological resour...

Full description

Bibliographic Details
Main Author: 施睿倬
Other Authors: 林義証
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/91344231752999588977
Description
Summary:碩士 === 建國科技大學 === 機電光系統研究所 === 92 === The Web is becoming the largest data repository in the world. How to discover knowledge in diverse data resources on the Web which is benefiting Web-based information systems is being studied in the research area of Web mining. Multilingual terminological resources, which include multilingual lexicons or thesaurus, are valuable for conducting academic researches or developing practical applications. For example, machine translation, cross-language information retrieval, or even information exchange in electronic commerce. A bilingual dictionary is required for a machine translation system. Since there are many unknown words grow up in the Internet, a traditional dictionary is unable to contain all necessary vocabularies. There should be an efficient way to update the dictionary. We proposed a way to extract Chinese-English thesaurus from a Chinese news corpus. The proposed method needs no dictionary which can be updated the bilingual dictionary automatically. There are two phrases to extract thesaurus from a corpus under our proposal. The first phrase is to collect useful news data from the web. The second phrase is to extract the thesaurus from the data collected in the first phrase. The results show that our method is very promising.