Thesaurus Extraction From the World Wide Web
碩士 === 國立中正大學 === 資訊工程研究所 === 91 === As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW. The system used two thesaurus extraction methods...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/06154737758604082281 |
id |
ndltd-TW-091CCU00392123 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-091CCU003921232016-06-24T04:15:54Z http://ndltd.ncl.edu.tw/handle/06154737758604082281 Thesaurus Extraction From the World Wide Web 從全球資訊網擷取同義詞 Yi-Min Shih 石逸民 碩士 國立中正大學 資訊工程研究所 91 As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW. The system used two thesaurus extraction methods. In the first method, we base on the writing practice and extract contents from web page. Then we extract candidates of thesauruses from web contents by some syntactic analysis. We will merge these candidates and reduce the noise of thesauruses and produce a thesaurus dictionary. In the second method, we analyze the anchor text from web and produce web site alias and abbreviation. We also collect data from the web site which one in BIG5 and GB code, and extract the relation of simplified Chinese and standardized Chinese phrase. We can use thesaurus dictionary increase the search result and make the results more precisely. Sun Wu 吳昇 學位論文 ; thesis 42 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中正大學 === 資訊工程研究所 === 91 === As the amount of data grows in WWW, there are more and more researches to extract valuable information from the web. In this thesis, we will present an automatic thesaurus extraction system from the WWW.
The system used two thesaurus extraction methods. In the first method, we base on the writing practice and extract contents from web page. Then we extract candidates of thesauruses from web contents by some syntactic analysis. We will merge these candidates and reduce the noise of thesauruses and produce a thesaurus dictionary. In the second method, we analyze the anchor text from web and produce web site alias and abbreviation. We also collect data from the web site which one in BIG5 and GB code, and extract the relation of simplified Chinese and standardized Chinese phrase. We can use thesaurus dictionary increase the search result and make the results more precisely.
|
author2 |
Sun Wu |
author_facet |
Sun Wu Yi-Min Shih 石逸民 |
author |
Yi-Min Shih 石逸民 |
spellingShingle |
Yi-Min Shih 石逸民 Thesaurus Extraction From the World Wide Web |
author_sort |
Yi-Min Shih |
title |
Thesaurus Extraction From the World Wide Web |
title_short |
Thesaurus Extraction From the World Wide Web |
title_full |
Thesaurus Extraction From the World Wide Web |
title_fullStr |
Thesaurus Extraction From the World Wide Web |
title_full_unstemmed |
Thesaurus Extraction From the World Wide Web |
title_sort |
thesaurus extraction from the world wide web |
url |
http://ndltd.ncl.edu.tw/handle/06154737758604082281 |
work_keys_str_mv |
AT yiminshih thesaurusextractionfromtheworldwideweb AT shíyìmín thesaurusextractionfromtheworldwideweb AT yiminshih cóngquánqiúzīxùnwǎngxiéqǔtóngyìcí AT shíyìmín cóngquánqiúzīxùnwǎngxiéqǔtóngyìcí |
_version_ |
1718322718563106816 |