Refining Chinese Sentences by Removing Words and Choosing Concise Terms
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/3kgc6v |
id |
ndltd-TW-106NTU05392042 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NTU053920422019-07-25T04:46:48Z http://ndltd.ncl.edu.tw/handle/3kgc6v Refining Chinese Sentences by Removing Words and Choosing Concise Terms 詞彙刪簡模型用於中文句子精練 Sven Riemenschneider 斯文 碩士 國立臺灣大學 資訊工程學研究所 106 Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still carried out manually. We have obtained a year of editing records and provide some insight into this phenomenon. In spoken Chinese, many words are composed of two or more characters, in writing the same meaning can often be conveyed by a subsequence. This gives rise to subword deletion. We show this to be an open class problem, with thousands of different word reductions pairs. Often there exist different reduction or deletion possibilities for the same word, contributing to the difficulty of achieving consistency with a variety of human annotators, given only a single reference and without explicitly formulated rules. We show that a neural machine translation based model can usually judge with very high precision whether to delete a word, but suffers from low recall, especially at the subword level. We combine sequence labeling at word and character level and attain the best performance for full and subword deletion in a single. Considering the ambiguity inherent in the problem and given only a single reference, our model attains reasonable consistency, especially on grammatical function words with hundreds or even thousands of instances available for training, Open word classes are more difficult to handle with in many cases only a few instances per word. We show how syntactic features are particularly helpful for these. HSIN-HSI CHEN 陳信希 2018 學位論文 ; thesis 85 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Writing in a professional or formal context requires conciseness. Starting from a colloquial draft, text is gradually refined and wordiness removed, resulting in a more formal style. For newspaper editing this is among the most frequent operations, yet is still carried out manually.
We have obtained a year of editing records and provide some insight into this phenomenon. In spoken Chinese, many words are composed of two or more characters, in writing the same meaning can often be conveyed by a subsequence. This gives rise to subword deletion. We show this to be an open class problem, with thousands of different word reductions pairs. Often there exist different reduction or deletion possibilities for the same word, contributing to the difficulty of achieving consistency with a variety of human annotators, given only a single reference and without explicitly formulated rules.
We show that a neural machine translation based model can usually judge with very high precision whether to delete a word, but suffers from low recall, especially at the subword level. We combine sequence labeling at word and character level and attain the best performance for full and subword deletion in a single.
Considering the ambiguity inherent in the problem and given only a single reference, our model attains reasonable consistency, especially on grammatical function words with hundreds or even thousands of instances available for training, Open word classes are more difficult to handle with in many cases only a few instances per word. We show how syntactic features are particularly helpful for these.
|
author2 |
HSIN-HSI CHEN |
author_facet |
HSIN-HSI CHEN Sven Riemenschneider 斯文 |
author |
Sven Riemenschneider 斯文 |
spellingShingle |
Sven Riemenschneider 斯文 Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
author_sort |
Sven Riemenschneider |
title |
Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
title_short |
Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
title_full |
Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
title_fullStr |
Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
title_full_unstemmed |
Refining Chinese Sentences by Removing Words and Choosing Concise Terms |
title_sort |
refining chinese sentences by removing words and choosing concise terms |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/3kgc6v |
work_keys_str_mv |
AT svenriemenschneider refiningchinesesentencesbyremovingwordsandchoosingconciseterms AT sīwén refiningchinesesentencesbyremovingwordsandchoosingconciseterms AT svenriemenschneider cíhuìshānjiǎnmóxíngyòngyúzhōngwénjùzijīngliàn AT sīwén cíhuìshānjiǎnmóxíngyòngyúzhōngwénjùzijīngliàn |
_version_ |
1719229968301424640 |