A Way to Extract Thesaurus Without using a Dictionary

碩士 === 建國科技大學 === 機電光系統研究所 === 92 === The Web is becoming the largest data repository in the world. How to discover knowledge in diverse data resources on the Web which is benefiting Web-based information systems is being studied in the research area of Web mining. Multilingual terminological resour...

Full description

Bibliographic Details
Main Author: 施睿倬
Other Authors: 林義証
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/91344231752999588977
id ndltd-TW-092CTU05490001
record_format oai_dc
spelling ndltd-TW-092CTU054900012015-10-13T14:52:52Z http://ndltd.ncl.edu.tw/handle/91344231752999588977 A Way to Extract Thesaurus Without using a Dictionary 一個不需辭典輔助之中英同義詞抽取方法 施睿倬 碩士 建國科技大學 機電光系統研究所 92 The Web is becoming the largest data repository in the world. How to discover knowledge in diverse data resources on the Web which is benefiting Web-based information systems is being studied in the research area of Web mining. Multilingual terminological resources, which include multilingual lexicons or thesaurus, are valuable for conducting academic researches or developing practical applications. For example, machine translation, cross-language information retrieval, or even information exchange in electronic commerce. A bilingual dictionary is required for a machine translation system. Since there are many unknown words grow up in the Internet, a traditional dictionary is unable to contain all necessary vocabularies. There should be an efficient way to update the dictionary. We proposed a way to extract Chinese-English thesaurus from a Chinese news corpus. The proposed method needs no dictionary which can be updated the bilingual dictionary automatically. There are two phrases to extract thesaurus from a corpus under our proposal. The first phrase is to collect useful news data from the web. The second phrase is to extract the thesaurus from the data collected in the first phrase. The results show that our method is very promising. 林義証 學位論文 ; thesis 64 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 建國科技大學 === 機電光系統研究所 === 92 === The Web is becoming the largest data repository in the world. How to discover knowledge in diverse data resources on the Web which is benefiting Web-based information systems is being studied in the research area of Web mining. Multilingual terminological resources, which include multilingual lexicons or thesaurus, are valuable for conducting academic researches or developing practical applications. For example, machine translation, cross-language information retrieval, or even information exchange in electronic commerce. A bilingual dictionary is required for a machine translation system. Since there are many unknown words grow up in the Internet, a traditional dictionary is unable to contain all necessary vocabularies. There should be an efficient way to update the dictionary. We proposed a way to extract Chinese-English thesaurus from a Chinese news corpus. The proposed method needs no dictionary which can be updated the bilingual dictionary automatically. There are two phrases to extract thesaurus from a corpus under our proposal. The first phrase is to collect useful news data from the web. The second phrase is to extract the thesaurus from the data collected in the first phrase. The results show that our method is very promising.
author2 林義証
author_facet 林義証
施睿倬
author 施睿倬
spellingShingle 施睿倬
A Way to Extract Thesaurus Without using a Dictionary
author_sort 施睿倬
title A Way to Extract Thesaurus Without using a Dictionary
title_short A Way to Extract Thesaurus Without using a Dictionary
title_full A Way to Extract Thesaurus Without using a Dictionary
title_fullStr A Way to Extract Thesaurus Without using a Dictionary
title_full_unstemmed A Way to Extract Thesaurus Without using a Dictionary
title_sort way to extract thesaurus without using a dictionary
url http://ndltd.ncl.edu.tw/handle/91344231752999588977
work_keys_str_mv AT shīruìzhuō awaytoextractthesauruswithoutusingadictionary
AT shīruìzhuō yīgèbùxūcídiǎnfǔzhùzhīzhōngyīngtóngyìcíchōuqǔfāngfǎ
AT shīruìzhuō waytoextractthesauruswithoutusingadictionary
_version_ 1717759429260083200