Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems

碩士 === 國立交通大學 === 資訊工程系 === 88 === The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles...

Full description

Bibliographic Details
Main Authors:	Zhao,San-Lung, 趙善隆
Other Authors:	Lee, Hsi-Jian
Format:	Others
Language:	en_US
Published:	2000
Online Access:	http://ndltd.ncl.edu.tw/handle/69736609054301724780

id	ndltd-TW-088NCTU0392070
record_format	oai_dc
spelling	ndltd-TW-088NCTU03920702015-10-13T10:59:52Z http://ndltd.ncl.edu.tw/handle/69736609054301724780 Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems 中文文件處理系統中使用之多核心辨識方法與簡化型語言模式 Zhao,San-Lung 趙善隆碩士國立交通大學資訊工程系 88 The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles will affect the performance of character segmentation and character recognition. A skew angle detection method is used and a modified rotate transform is proposed to rotate document images. In our system, sentences and characters must be extracted for recognition engines. For this purpose, document images must be segmented into text blocks, text lines, and character images. After we detect the punctuation marks in the character images, we construct sentences from character images. In the recognition module, we use two recognition engines to recognize the character images. Contour directional features and crossing count features are selected for kernel 1 and Oka''s cellular features and peripheral background area features are selected for kernel 2. The weights of these kernels and features are related to the relative stroke widths of character images which provide measurements about character image quality. When we construct recognition engines, the features are trained from a character image database selecting from document images. To provide more robust training features to increase the recognition rate, bad features instead of bad images are removed in the feature training process. In the post-processing module, a simplified language model is used. The model includes word selection bound setting, matching order establishing, fast word matching, and most-confident word selection. By using this model, the processing can be speed-up. The experiments performed on more than 40 articles images show the system we propose here is very effective and efficient. Lee, Hsi-Jian 李錫堅 2000 學位論文 ; thesis 61 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立交通大學 === 資訊工程系 === 88 === The goal of this thesis is to propose a general Chinese document processing systems which consists of three modules: preprocessing, recognition kernel, and postprocessing. In the preprocessing module, input images probably have small skew angles. These skew angles will affect the performance of character segmentation and character recognition. A skew angle detection method is used and a modified rotate transform is proposed to rotate document images. In our system, sentences and characters must be extracted for recognition engines. For this purpose, document images must be segmented into text blocks, text lines, and character images. After we detect the punctuation marks in the character images, we construct sentences from character images. In the recognition module, we use two recognition engines to recognize the character images. Contour directional features and crossing count features are selected for kernel 1 and Oka''s cellular features and peripheral background area features are selected for kernel 2. The weights of these kernels and features are related to the relative stroke widths of character images which provide measurements about character image quality. When we construct recognition engines, the features are trained from a character image database selecting from document images. To provide more robust training features to increase the recognition rate, bad features instead of bad images are removed in the feature training process. In the post-processing module, a simplified language model is used. The model includes word selection bound setting, matching order establishing, fast word matching, and most-confident word selection. By using this model, the processing can be speed-up. The experiments performed on more than 40 articles images show the system we propose here is very effective and efficient.
author2	Lee, Hsi-Jian
author_facet	Lee, Hsi-Jian Zhao,San-Lung 趙善隆
author	Zhao,San-Lung 趙善隆
spellingShingle	Zhao,San-Lung 趙善隆 Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
author_sort	Zhao,San-Lung
title	Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_short	Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_full	Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_fullStr	Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_full_unstemmed	Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems
title_sort	multi-kernel chinese characters recognition and a simplified language model used in general document processing systems
publishDate	2000
url	http://ndltd.ncl.edu.tw/handle/69736609054301724780
work_keys_str_mv	AT zhaosanlung multikernelchinesecharactersrecognitionandasimplifiedlanguagemodelusedingeneraldocumentprocessingsystems AT zhàoshànlóng multikernelchinesecharactersrecognitionandasimplifiedlanguagemodelusedingeneraldocumentprocessingsystems AT zhaosanlung zhōngwénwénjiànchùlǐxìtǒngzhōngshǐyòngzhīduōhéxīnbiànshífāngfǎyǔjiǎnhuàxíngyǔyánmóshì AT zhàoshànlóng zhōngwénwénjiànchùlǐxìtǒngzhōngshǐyòngzhīduōhéxīnbiànshífāngfǎyǔjiǎnhuàxíngyǔyánmóshì
_version_	1716835374325563392

Multi-kernel Chinese Characters Recognition and A Simplified Language Model Used in General Document Processing Systems

Similar Items