Information Extraction for Ancient Chinese Corpora

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extract...

Full description

Bibliographic Details
Main Authors: Jia-Yang Chang, 張嘉洋
Other Authors: Yen-Jen Oyang
Format: Others
Language:zh-TW
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/16427562231824811085
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis.