Statistical Chinese News Summarization
碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 98 === With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document s...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/tu9p36 |
id |
ndltd-TW-098TIT05392033 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098TIT053920332019-05-15T20:33:25Z http://ndltd.ncl.edu.tw/handle/tu9p36 Statistical Chinese News Summarization 統計式中文新聞摘要 Jeng-Yuan Yang 楊政遠 碩士 國立臺北科技大學 資訊工程系研究所 98 With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods. The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time. Jenq-Haur Wang 王正豪 2010 學位論文 ; thesis 68 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 資訊工程系研究所 === 98 === With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods.
The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time.
|
author2 |
Jenq-Haur Wang |
author_facet |
Jenq-Haur Wang Jeng-Yuan Yang 楊政遠 |
author |
Jeng-Yuan Yang 楊政遠 |
spellingShingle |
Jeng-Yuan Yang 楊政遠 Statistical Chinese News Summarization |
author_sort |
Jeng-Yuan Yang |
title |
Statistical Chinese News Summarization |
title_short |
Statistical Chinese News Summarization |
title_full |
Statistical Chinese News Summarization |
title_fullStr |
Statistical Chinese News Summarization |
title_full_unstemmed |
Statistical Chinese News Summarization |
title_sort |
statistical chinese news summarization |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/tu9p36 |
work_keys_str_mv |
AT jengyuanyang statisticalchinesenewssummarization AT yángzhèngyuǎn statisticalchinesenewssummarization AT jengyuanyang tǒngjìshìzhōngwénxīnwénzhāiyào AT yángzhèngyuǎn tǒngjìshìzhōngwénxīnwénzhāiyào |
_version_ |
1719101298943459328 |