An Estimation of the Entropy of Chinese

碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chine...

Full description

Bibliographic Details
Main Authors: Yuh-Juh Lin, 林玉柱
Other Authors: Jyun-Sheng Chang
Format: Others
Language:en_US
Published: 1994
Online Access:http://ndltd.ncl.edu.tw/handle/82305916772416048578
id ndltd-TW-082NTHU0394016
record_format oai_dc
spelling ndltd-TW-082NTHU03940162016-07-18T04:09:48Z http://ndltd.ncl.edu.tw/handle/82305916772416048578 An Estimation of the Entropy of Chinese 中文熵值上限的估算 Yuh-Juh Lin 林玉柱 碩士 國立清華大學 資訊科學學系 82 There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chinese processing. We accomplish these three closely related tasks by estimating the cross- entropy of Chinese and our language model, and then analyse the bottle- necks in Chinese processing from the results. The cross- entropy of Chinese and our language model is 12.66 bits per word or 3.88 bits per byte, which is better than IWCB by 0.6 bit per word. At last, we diagnose the bottlenecks in Chinese processing as Name and Unknown classes in that they have enormous perplexities in our model, which seem hard to be much improved. Jyun-Sheng Chang 張俊盛 1994 學位論文 ; thesis 41 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chinese processing. We accomplish these three closely related tasks by estimating the cross- entropy of Chinese and our language model, and then analyse the bottle- necks in Chinese processing from the results. The cross- entropy of Chinese and our language model is 12.66 bits per word or 3.88 bits per byte, which is better than IWCB by 0.6 bit per word. At last, we diagnose the bottlenecks in Chinese processing as Name and Unknown classes in that they have enormous perplexities in our model, which seem hard to be much improved.
author2 Jyun-Sheng Chang
author_facet Jyun-Sheng Chang
Yuh-Juh Lin
林玉柱
author Yuh-Juh Lin
林玉柱
spellingShingle Yuh-Juh Lin
林玉柱
An Estimation of the Entropy of Chinese
author_sort Yuh-Juh Lin
title An Estimation of the Entropy of Chinese
title_short An Estimation of the Entropy of Chinese
title_full An Estimation of the Entropy of Chinese
title_fullStr An Estimation of the Entropy of Chinese
title_full_unstemmed An Estimation of the Entropy of Chinese
title_sort estimation of the entropy of chinese
publishDate 1994
url http://ndltd.ncl.edu.tw/handle/82305916772416048578
work_keys_str_mv AT yuhjuhlin anestimationoftheentropyofchinese
AT línyùzhù anestimationoftheentropyofchinese
AT yuhjuhlin zhōngwénshāngzhíshàngxiàndegūsuàn
AT línyùzhù zhōngwénshāngzhíshàngxiàndegūsuàn
AT yuhjuhlin estimationoftheentropyofchinese
AT línyùzhù estimationoftheentropyofchinese
_version_ 1718353071107473408