An Estimation of the Entropy of Chinese

碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chine...

Full description

Bibliographic Details
Main Authors:	Yuh-Juh Lin, 林玉柱
Other Authors:	Jyun-Sheng Chang
Format:	Others
Language:	en_US
Published:	1994
Online Access:	http://ndltd.ncl.edu.tw/handle/82305916772416048578

id	ndltd-TW-082NTHU0394016
record_format	oai_dc
spelling	ndltd-TW-082NTHU03940162016-07-18T04:09:48Z http://ndltd.ncl.edu.tw/handle/82305916772416048578 An Estimation of the Entropy of Chinese 中文熵值上限的估算 Yuh-Juh Lin 林玉柱碩士國立清華大學資訊科學學系 82 There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chinese processing. We accomplish these three closely related tasks by estimating the cross- entropy of Chinese and our language model, and then analyse the bottle- necks in Chinese processing from the results. The cross- entropy of Chinese and our language model is 12.66 bits per word or 3.88 bits per byte, which is better than IWCB by 0.6 bit per word. At last, we diagnose the bottlenecks in Chinese processing as Name and Unknown classes in that they have enormous perplexities in our model, which seem hard to be much improved. Jyun-Sheng Chang 張俊盛 1994 學位論文 ; thesis 41 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊科學學系 === 82 === There are three tasks in this thesis: the first is to develop a new approach to building class-based n-gram models,the second is to estimate the entropy of Chinese, and the last is to ana- lyze some bottlenecks in Chinese processing. We accomplish these three closely related tasks by estimating the cross- entropy of Chinese and our language model, and then analyse the bottle- necks in Chinese processing from the results. The cross- entropy of Chinese and our language model is 12.66 bits per word or 3.88 bits per byte, which is better than IWCB by 0.6 bit per word. At last, we diagnose the bottlenecks in Chinese processing as Name and Unknown classes in that they have enormous perplexities in our model, which seem hard to be much improved.
author2	Jyun-Sheng Chang
author_facet	Jyun-Sheng Chang Yuh-Juh Lin 林玉柱
author	Yuh-Juh Lin 林玉柱
spellingShingle	Yuh-Juh Lin 林玉柱 An Estimation of the Entropy of Chinese
author_sort	Yuh-Juh Lin
title	An Estimation of the Entropy of Chinese
title_short	An Estimation of the Entropy of Chinese
title_full	An Estimation of the Entropy of Chinese
title_fullStr	An Estimation of the Entropy of Chinese
title_full_unstemmed	An Estimation of the Entropy of Chinese
title_sort	estimation of the entropy of chinese
publishDate	1994
url	http://ndltd.ncl.edu.tw/handle/82305916772416048578
work_keys_str_mv	AT yuhjuhlin anestimationoftheentropyofchinese AT línyùzhù anestimationoftheentropyofchinese AT yuhjuhlin zhōngwénshāngzhíshàngxiàndegūsuàn AT línyùzhù zhōngwénshāngzhíshàngxiàndegūsuàn AT yuhjuhlin estimationoftheentropyofchinese AT línyùzhù estimationoftheentropyofchinese
_version_	1718353071107473408

An Estimation of the Entropy of Chinese

Similar Items