Statistical Chinese News Summarization

碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 98 === With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document s...

Full description

Bibliographic Details
Main Authors: Jeng-Yuan Yang, 楊政遠
Other Authors: Jenq-Haur Wang
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/tu9p36
id ndltd-TW-098TIT05392033
record_format oai_dc
spelling ndltd-TW-098TIT053920332019-05-15T20:33:25Z http://ndltd.ncl.edu.tw/handle/tu9p36 Statistical Chinese News Summarization 統計式中文新聞摘要 Jeng-Yuan Yang 楊政遠 碩士 國立臺北科技大學 資訊工程系研究所 98 With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods. The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time. Jenq-Haur Wang 王正豪 2010 學位論文 ; thesis 68 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 98 === With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods. The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time.
author2 Jenq-Haur Wang
author_facet Jenq-Haur Wang
Jeng-Yuan Yang
楊政遠
author Jeng-Yuan Yang
楊政遠
spellingShingle Jeng-Yuan Yang
楊政遠
Statistical Chinese News Summarization
author_sort Jeng-Yuan Yang
title Statistical Chinese News Summarization
title_short Statistical Chinese News Summarization
title_full Statistical Chinese News Summarization
title_fullStr Statistical Chinese News Summarization
title_full_unstemmed Statistical Chinese News Summarization
title_sort statistical chinese news summarization
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/tu9p36
work_keys_str_mv AT jengyuanyang statisticalchinesenewssummarization
AT yángzhèngyuǎn statisticalchinesenewssummarization
AT jengyuanyang tǒngjìshìzhōngwénxīnwénzhāiyào
AT yángzhèngyuǎn tǒngjìshìzhōngwénxīnwénzhāiyào
_version_ 1719101298943459328