An Estimation of the Entropy of Chinese
碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chine...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
1994
|
Online Access: | http://ndltd.ncl.edu.tw/handle/82305916772416048578 |
id |
ndltd-TW-082NTHU0394016 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-082NTHU03940162016-07-18T04:09:48Z http://ndltd.ncl.edu.tw/handle/82305916772416048578 An Estimation of the Entropy of Chinese 中文熵值上限的估算 Yuh-Juh Lin 林玉柱 碩士 國立清華大學 資訊科學學系 82 There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chinese processing. We accomplish these three closely related tasks by estimating the cross- entropy of Chinese and our language model, and then analyse the bottle- necks in Chinese processing from the results. The cross- entropy of Chinese and our language model is 12.66 bits per word or 3.88 bits per byte, which is better than IWCB by 0.6 bit per word. At last, we diagnose the bottlenecks in Chinese processing as Name and Unknown classes in that they have enormous perplexities in our model, which seem hard to be much improved. Jyun-Sheng Chang 張俊盛 1994 學位論文 ; thesis 41 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a
new approach to building class-based n-gram models,the second
is to estimate the entropy of Chinese, and the last is to ana-
lyze some bottlenecks in Chinese processing. We accomplish
these three closely related tasks by estimating the cross-
entropy of Chinese and our language model, and then analyse the
bottle- necks in Chinese processing from the results. The cross-
entropy of Chinese and our language model is 12.66 bits per
word or 3.88 bits per byte, which is better than IWCB by 0.6
bit per word. At last, we diagnose the bottlenecks in Chinese
processing as Name and Unknown classes in that they have
enormous perplexities in our model, which seem hard to be much
improved.
|
author2 |
Jyun-Sheng Chang |
author_facet |
Jyun-Sheng Chang Yuh-Juh Lin 林玉柱 |
author |
Yuh-Juh Lin 林玉柱 |
spellingShingle |
Yuh-Juh Lin 林玉柱 An Estimation of the Entropy of Chinese |
author_sort |
Yuh-Juh Lin |
title |
An Estimation of the Entropy of Chinese |
title_short |
An Estimation of the Entropy of Chinese |
title_full |
An Estimation of the Entropy of Chinese |
title_fullStr |
An Estimation of the Entropy of Chinese |
title_full_unstemmed |
An Estimation of the Entropy of Chinese |
title_sort |
estimation of the entropy of chinese |
publishDate |
1994 |
url |
http://ndltd.ncl.edu.tw/handle/82305916772416048578 |
work_keys_str_mv |
AT yuhjuhlin anestimationoftheentropyofchinese AT línyùzhù anestimationoftheentropyofchinese AT yuhjuhlin zhōngwénshāngzhíshàngxiàndegūsuàn AT línyùzhù zhōngwénshāngzhíshàngxiàndegūsuàn AT yuhjuhlin estimationoftheentropyofchinese AT línyùzhù estimationoftheentropyofchinese |
_version_ |
1718353071107473408 |