Summary: | 碩士 === 國立臺灣科技大學 === 電機工程系 === 92 === In this thesis, a Chinese text compression scheme based on large alphabet Burrows-Wheeler transform(BWT) is proposed. First, an inputted Chinese text file is parsed with a large alphabet consisting of characters from BIG-5 and ASCII codes. Then, the parsed token stream is processed by BWT, MTF(Move To Front), and arithmetic coding. To improve the speed of the proposed scheme, we have also studied a few ways for practical implementations of BWT, MTF and arithmetic coding under large-alphabet parsing condition. According to the compression scheme, a practically executable program is developed. When compared with other compression programs, i.e., Win-ZIP, Win-RAR, and BZIP2, our program is shown, in Chinese text file compression experiments, to have better compression rates. Rate improvements are 12.9%, 4.7%, and 1.7%, respectively.
|