Information Extraction for Ancient Chinese Corpora
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extract...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
1999
|
Online Access: | http://ndltd.ncl.edu.tw/handle/16427562231824811085 |
id |
ndltd-TW-087NTU00392072 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-087NTU003920722016-02-01T04:12:40Z http://ndltd.ncl.edu.tw/handle/16427562231824811085 Information Extraction for Ancient Chinese Corpora 古文獻中資訊擷取之研究 Jia-Yang Chang 張嘉洋 碩士 國立臺灣大學 資訊工程學研究所 87 As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis. Yen-Jen Oyang 歐陽彥正 1999 學位論文 ; thesis 45 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis.
|
author2 |
Yen-Jen Oyang |
author_facet |
Yen-Jen Oyang Jia-Yang Chang 張嘉洋 |
author |
Jia-Yang Chang 張嘉洋 |
spellingShingle |
Jia-Yang Chang 張嘉洋 Information Extraction for Ancient Chinese Corpora |
author_sort |
Jia-Yang Chang |
title |
Information Extraction for Ancient Chinese Corpora |
title_short |
Information Extraction for Ancient Chinese Corpora |
title_full |
Information Extraction for Ancient Chinese Corpora |
title_fullStr |
Information Extraction for Ancient Chinese Corpora |
title_full_unstemmed |
Information Extraction for Ancient Chinese Corpora |
title_sort |
information extraction for ancient chinese corpora |
publishDate |
1999 |
url |
http://ndltd.ncl.edu.tw/handle/16427562231824811085 |
work_keys_str_mv |
AT jiayangchang informationextractionforancientchinesecorpora AT zhāngjiāyáng informationextractionforancientchinesecorpora AT jiayangchang gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū AT zhāngjiāyáng gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū |
_version_ |
1718174350986706944 |