Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems

碩士 === 國立交通大學 === 資訊工程系 === 88 === The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles...

Full description

Bibliographic Details
Main Authors: Zhao,San-Lung, 趙善隆
Other Authors: Lee, Hsi-Jian
Format: Others
Language:en_US
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/69736609054301724780
id ndltd-TW-088NCTU0392070
record_format oai_dc
spelling ndltd-TW-088NCTU03920702015-10-13T10:59:52Z http://ndltd.ncl.edu.tw/handle/69736609054301724780 Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems 中文文件處理系統中使用之多核心辨識方法與簡化型語言模式 Zhao,San-Lung 趙善隆 碩士 國立交通大學 資訊工程系 88 The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles will affect the performance of character segmentation and character recognition. A skew angle detection method is used and a modified rotate transform is proposed to rotate document images. In our system, sentences and characters must be extracted for recognition engines. For this purpose, document images must be segmented into text blocks, text lines, and character images. After we detect the punctuation marks in the character images, we construct sentences from character images. In the recognition module, we use two recognition engines to recognize the character images. Contour directional features and crossing count features are selected for kernel 1 and Oka''s cellular features and peripheral background area features are selected for kernel 2. The weights of these kernels and features are related to the relative stroke widths of character images which provide measurements about character image quality. When we construct recognition engines, the features are trained from a character image database selecting from document images. To provide more robust training features to increase the recognition rate, bad features instead of bad images are removed in the feature training process. In the post-processing module, a simplified language model is used. The model includes word selection bound setting, matching order establishing, fast word matching, and most-confident word selection. By using this model, the processing can be speed-up. The experiments performed on more than 40 articles images show the system we propose here is very effective and efficient. Lee, Hsi-Jian 李錫堅 2000 學位論文 ; thesis 61 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊工程系 === 88 === The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles will affect the performance of character segmentation and character recognition. A skew angle detection method is used and a modified rotate transform is proposed to rotate document images. In our system, sentences and characters must be extracted for recognition engines. For this purpose, document images must be segmented into text blocks, text lines, and character images. After we detect the punctuation marks in the character images, we construct sentences from character images. In the recognition module, we use two recognition engines to recognize the character images. Contour directional features and crossing count features are selected for kernel 1 and Oka''s cellular features and peripheral background area features are selected for kernel 2. The weights of these kernels and features are related to the relative stroke widths of character images which provide measurements about character image quality. When we construct recognition engines, the features are trained from a character image database selecting from document images. To provide more robust training features to increase the recognition rate, bad features instead of bad images are removed in the feature training process. In the post-processing module, a simplified language model is used. The model includes word selection bound setting, matching order establishing, fast word matching, and most-confident word selection. By using this model, the processing can be speed-up. The experiments performed on more than 40 articles images show the system we propose here is very effective and efficient.
author2 Lee, Hsi-Jian
author_facet Lee, Hsi-Jian
Zhao,San-Lung
趙善隆
author Zhao,San-Lung
趙善隆
spellingShingle Zhao,San-Lung
趙善隆
Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
author_sort Zhao,San-Lung
title Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_short Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_full Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_fullStr Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_full_unstemmed Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_sort multi-kernel chinese characters recognition and a simplified language model used in general document processing systems
publishDate 2000
url http://ndltd.ncl.edu.tw/handle/69736609054301724780
work_keys_str_mv AT zhaosanlung multikernelchinesecharactersrecognitionandasimplifiedlanguagemodelusedingeneraldocumentprocessingsystems
AT zhàoshànlóng multikernelchinesecharactersrecognitionandasimplifiedlanguagemodelusedingeneraldocumentprocessingsystems
AT zhaosanlung zhōngwénwénjiànchùlǐxìtǒngzhōngshǐyòngzhīduōhéxīnbiànshífāngfǎyǔjiǎnhuàxíngyǔyánmóshì
AT zhàoshànlóng zhōngwénwénjiànchùlǐxìtǒngzhōngshǐyòngzhīduōhéxīnbiànshífāngfǎyǔjiǎnhuàxíngyǔyánmóshì
_version_ 1716835374325563392