Detection and Orientation of Italic Text in Chinese OCR System

碩士 === 國立交通大學 === 資訊工程系 === 87 === Chinese OCR (Chinese Optical Character Recognition) is a well studied subject. Research institutes and manufacturers have been able to provide software with recognition rate of 95% or better. Still there are some issues to solve to make a Chi...

Full description

Bibliographic Details
Main Authors: Shih-Jin Huang, 黃士晉
Other Authors: Jenn-Hann Liou
Format: Others
Language:zh-TW
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/80925495397278572508
Description
Summary:碩士 === 國立交通大學 === 資訊工程系 === 87 === Chinese OCR (Chinese Optical Character Recognition) is a well studied subject. Research institutes and manufacturers have been able to provide software with recognition rate of 95% or better. Still there are some issues to solve to make a Chinese OCR system more satisfactory. Italic text handicaps almost all Chinese OCR software. It does not appear in tradition Chinese books. But it can be seen in most scientific articles, name cards, etc. This thesis study this problem. We do not try to "recognize" the characters because this is a subject very much studied and very well solved. We try to "find the italic text" and subsequently "reoriented" (i.e. transform it back to non-italic) it. We do come up with handsome results. But when the last character of an italic string touches the next non-italic character (this is not rare), we usually are unable to separate them and the result is poor.