A hybrid approach to automatic text summarization

碩士 === 國立中山大學 === 資訊管理學系研究所 === 96 === Automatic text summarization can efficiently and effectively save users’ time while reading text documents. The objective of automatic text summarization is to extract essential sentences that cover almost all the concepts of a document so that users are able t...

Full description

Bibliographic Details
Main Authors: Li-An Yuan, 袁立安
Other Authors: Chang Te Min
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/8rf2t4
Description
Summary:碩士 === 國立中山大學 === 資訊管理學系研究所 === 96 === Automatic text summarization can efficiently and effectively save users’ time while reading text documents. The objective of automatic text summarization is to extract essential sentences that cover almost all the concepts of a document so that users are able to comprehend the ideas the document tries to address by simply reading through the corresponding summary. This research focuses on developing a hybrid automatic text summarization approach, KCS, to enhancing the quality of summaries. This approach basically consists of two major components: first, it employs the K-mixture probabilistic model to calculate term weights in a statistical sense; it then identifies the term relationship between nouns and nouns as well as nouns and verbs, which results in the connective strength (CS) of nouns. With the connective strengths available scores of sentences can be calculated and ranked to be extracted. We conduct three experiments to justify the proposed approach. The quality of summary is examined by its capability of increasing accuracy of text classification,while the classifier employed, the Naïve Bayes classifier, is kept the same through all experiments. The results show that the K-mixture model is more contributive to document classification than traditional TFIDF weighting scheme. It, however, is still no better than CS, a more complex linguistic-based approach. More importantly, our proposed approach, KCS, performs best among all approaches considered. It implies that KCS can extract more representative sentences from the document and its feasibility in text summarization applications is thus justified.