Information Extraction for Ancient Chinese Corpora

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extract...

Full description

Bibliographic Details
Main Authors: Jia-Yang Chang, 張嘉洋
Other Authors: Yen-Jen Oyang
Format: Others
Language:zh-TW
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/16427562231824811085
id ndltd-TW-087NTU00392072
record_format oai_dc
spelling ndltd-TW-087NTU003920722016-02-01T04:12:40Z http://ndltd.ncl.edu.tw/handle/16427562231824811085 Information Extraction for Ancient Chinese Corpora 古文獻中資訊擷取之研究 Jia-Yang Chang 張嘉洋 碩士 國立臺灣大學 資訊工程學研究所 87 As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis. Yen-Jen Oyang 歐陽彥正 1999 學位論文 ; thesis 45 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis.
author2 Yen-Jen Oyang
author_facet Yen-Jen Oyang
Jia-Yang Chang
張嘉洋
author Jia-Yang Chang
張嘉洋
spellingShingle Jia-Yang Chang
張嘉洋
Information Extraction for Ancient Chinese Corpora
author_sort Jia-Yang Chang
title Information Extraction for Ancient Chinese Corpora
title_short Information Extraction for Ancient Chinese Corpora
title_full Information Extraction for Ancient Chinese Corpora
title_fullStr Information Extraction for Ancient Chinese Corpora
title_full_unstemmed Information Extraction for Ancient Chinese Corpora
title_sort information extraction for ancient chinese corpora
publishDate 1999
url http://ndltd.ncl.edu.tw/handle/16427562231824811085
work_keys_str_mv AT jiayangchang informationextractionforancientchinesecorpora
AT zhāngjiāyáng informationextractionforancientchinesecorpora
AT jiayangchang gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū
AT zhāngjiāyáng gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū
_version_ 1718174350986706944