Refining Chinese Sentences by Removing Words and Choosing Concise Terms

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still...

Full description

Bibliographic Details
Main Authors:	Sven Riemenschneider, 斯文
Other Authors:	HSIN-HSI CHEN
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/3kgc6v

id	ndltd-TW-106NTU05392042
record_format	oai_dc
spelling	ndltd-TW-106NTU053920422019-07-25T04:46:48Z http://ndltd.ncl.edu.tw/handle/3kgc6v Refining Chinese Sentences by Removing Words and Choosing Concise Terms 詞彙刪簡模型用於中文句子精練 Sven Riemenschneider 斯文碩士國立臺灣大學資訊工程學研究所 106 Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still carried out manually. We have obtained a year of editing records and provide some insight into this phenomenon. In spoken Chinese, many words are composed of two or more characters, in writing the same meaning can often be conveyed by a subsequence. This gives rise to subword deletion. We show this to be an open class problem, with thousands of different word reductions pairs. Often there exist different reduction or deletion possibilities for the same word, contributing to the difficulty of achieving consistency with a variety of human annotators, given only a single reference and without explicitly formulated rules. We show that a neural machine translation based model can usually judge with very high precision whether to delete a word, but suffers from low recall, especially at the subword level. We combine sequence labeling at word and character level and attain the best performance for full and subword deletion in a single. Considering the ambiguity inherent in the problem and given only a single reference, our model attains reasonable consistency, especially on grammatical function words with hundreds or even thousands of instances available for training, Open word classes are more difficult to handle with in many cases only a few instances per word. We show how syntactic features are particularly helpful for these. HSIN-HSI CHEN 陳信希 2018 學位論文 ; thesis 85 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still carried out manually. We have obtained a year of editing records and provide some insight into this phenomenon. In spoken Chinese, many words are composed of two or more characters, in writing the same meaning can often be conveyed by a subsequence. This gives rise to subword deletion. We show this to be an open class problem, with thousands of different word reductions pairs. Often there exist different reduction or deletion possibilities for the same word, contributing to the difficulty of achieving consistency with a variety of human annotators, given only a single reference and without explicitly formulated rules. We show that a neural machine translation based model can usually judge with very high precision whether to delete a word, but suffers from low recall, especially at the subword level. We combine sequence labeling at word and character level and attain the best performance for full and subword deletion in a single. Considering the ambiguity inherent in the problem and given only a single reference, our model attains reasonable consistency, especially on grammatical function words with hundreds or even thousands of instances available for training, Open word classes are more difficult to handle with in many cases only a few instances per word. We show how syntactic features are particularly helpful for these.
author2	HSIN-HSI CHEN
author_facet	HSIN-HSI CHEN Sven Riemenschneider 斯文
author	Sven Riemenschneider 斯文
spellingShingle	Sven Riemenschneider 斯文 Refining Chinese Sentences by Removing Words and Choosing Concise Terms
author_sort	Sven Riemenschneider
title	Refining Chinese Sentences by Removing Words and Choosing Concise Terms
title_short	Refining Chinese Sentences by Removing Words and Choosing Concise Terms
title_full	Refining Chinese Sentences by Removing Words and Choosing Concise Terms
title_fullStr	Refining Chinese Sentences by Removing Words and Choosing Concise Terms
title_full_unstemmed	Refining Chinese Sentences by Removing Words and Choosing Concise Terms
title_sort	refining chinese sentences by removing words and choosing concise terms
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/3kgc6v
work_keys_str_mv	AT svenriemenschneider refiningchinesesentencesbyremovingwordsandchoosingconciseterms AT sīwén refiningchinesesentencesbyremovingwordsandchoosingconciseterms AT svenriemenschneider cíhuìshānjiǎnmóxíngyòngyúzhōngwénjùzijīngliàn AT sīwén cíhuìshānjiǎnmóxíngyòngyúzhōngwénjùzijīngliàn
_version_	1719229968301424640

Refining Chinese Sentences by Removing Words and Choosing Concise Terms

Similar Items