Information Extraction for Ancient Chinese Corpora

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extract...

Full description

Bibliographic Details
Main Authors:	Jia-Yang Chang, 張嘉洋
Other Authors:	Yen-Jen Oyang
Format:	Others
Language:	zh-TW
Published:	1999
Online Access:	http://ndltd.ncl.edu.tw/handle/16427562231824811085

id	ndltd-TW-087NTU00392072
record_format	oai_dc
spelling	ndltd-TW-087NTU003920722016-02-01T04:12:40Z http://ndltd.ncl.edu.tw/handle/16427562231824811085 Information Extraction for Ancient Chinese Corpora 古文獻中資訊擷取之研究 Jia-Yang Chang 張嘉洋碩士國立臺灣大學資訊工程學研究所 87 As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis. Yen-Jen Oyang 歐陽彥正 1999 學位論文 ; thesis 45 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊工程學研究所 === 87 === As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis.
author2	Yen-Jen Oyang
author_facet	Yen-Jen Oyang Jia-Yang Chang 張嘉洋
author	Jia-Yang Chang 張嘉洋
spellingShingle	Jia-Yang Chang 張嘉洋 Information Extraction for Ancient Chinese Corpora
author_sort	Jia-Yang Chang
title	Information Extraction for Ancient Chinese Corpora
title_short	Information Extraction for Ancient Chinese Corpora
title_full	Information Extraction for Ancient Chinese Corpora
title_fullStr	Information Extraction for Ancient Chinese Corpora
title_full_unstemmed	Information Extraction for Ancient Chinese Corpora
title_sort	information extraction for ancient chinese corpora
publishDate	1999
url	http://ndltd.ncl.edu.tw/handle/16427562231824811085
work_keys_str_mv	AT jiayangchang informationextractionforancientchinesecorpora AT zhāngjiāyáng informationextractionforancientchinesecorpora AT jiayangchang gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū AT zhāngjiāyáng gǔwénxiànzhōngzīxùnxiéqǔzhīyánjiū
_version_	1718174350986706944

Information Extraction for Ancient Chinese Corpora

Similar Items