Line Removal and Broken Characters Recovery of Clinic Data

碩士 === 國立中央大學 === 資訊工程研究所 === 87 === There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame l...

Full description

Bibliographic Details
Main Authors: Jia Yo Xu, 許嘉佑
Other Authors: Kuo Chin Fan
Format: Others
Language:en_US
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/74160709171910677975
id ndltd-TW-087NCU00392018
record_format oai_dc
spelling ndltd-TW-087NCU003920182016-07-11T04:13:52Z http://ndltd.ncl.edu.tw/handle/74160709171910677975 Line Removal and Broken Characters Recovery of Clinic Data 病歷資料之線條去除及破碎字處理 Jia Yo Xu 許嘉佑 碩士 國立中央大學 資訊工程研究所 87 There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame lines, it will degenerate the recognition rate of OCR system seriously. Hence, how to remove frame lines without breaking those characters is an important issue of a form processing system. In the preprocessing stage, we want to translate the source gray-levels image into an ideal bi-levels image. In our system, Otsu binarization method and Gatos skew detection method are adopted as the binarization module and the skew detection module. Finally, the inverse operation of traditional image rotation transform is used as our skew adjustment module. In line detection and line removal stage, the horizontal projection of black points and the gradient of projection are used to determine the position of lines. After line removal, some survival pieces of lines might exist. We will remove these survival pieces by two masks to avoid influencing the result of broken character recovery. In broken character recovery stage, the linking relation information is collected by directional black run continuity. After collection the linking relation information, the filled-back method is determined by the ratio of the widths of two linking runs. Finally, postprocessing which is based on contour vectors fixes some defects of DBRC. In the experiments, 30 clinic data images and 12 notebook images were tested. The experimental recovery rate reveals the efficiency of our proposed method. Kuo Chin Fan 范國清 1999 學位論文 ; thesis 62 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊工程研究所 === 87 === There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame lines, it will degenerate the recognition rate of OCR system seriously. Hence, how to remove frame lines without breaking those characters is an important issue of a form processing system. In the preprocessing stage, we want to translate the source gray-levels image into an ideal bi-levels image. In our system, Otsu binarization method and Gatos skew detection method are adopted as the binarization module and the skew detection module. Finally, the inverse operation of traditional image rotation transform is used as our skew adjustment module. In line detection and line removal stage, the horizontal projection of black points and the gradient of projection are used to determine the position of lines. After line removal, some survival pieces of lines might exist. We will remove these survival pieces by two masks to avoid influencing the result of broken character recovery. In broken character recovery stage, the linking relation information is collected by directional black run continuity. After collection the linking relation information, the filled-back method is determined by the ratio of the widths of two linking runs. Finally, postprocessing which is based on contour vectors fixes some defects of DBRC. In the experiments, 30 clinic data images and 12 notebook images were tested. The experimental recovery rate reveals the efficiency of our proposed method.
author2 Kuo Chin Fan
author_facet Kuo Chin Fan
Jia Yo Xu
許嘉佑
author Jia Yo Xu
許嘉佑
spellingShingle Jia Yo Xu
許嘉佑
Line Removal and Broken Characters Recovery of Clinic Data
author_sort Jia Yo Xu
title Line Removal and Broken Characters Recovery of Clinic Data
title_short Line Removal and Broken Characters Recovery of Clinic Data
title_full Line Removal and Broken Characters Recovery of Clinic Data
title_fullStr Line Removal and Broken Characters Recovery of Clinic Data
title_full_unstemmed Line Removal and Broken Characters Recovery of Clinic Data
title_sort line removal and broken characters recovery of clinic data
publishDate 1999
url http://ndltd.ncl.edu.tw/handle/74160709171910677975
work_keys_str_mv AT jiayoxu lineremovalandbrokencharactersrecoveryofclinicdata
AT xǔjiāyòu lineremovalandbrokencharactersrecoveryofclinicdata
AT jiayoxu bìnglìzīliàozhīxiàntiáoqùchújípòsuìzìchùlǐ
AT xǔjiāyòu bìnglìzīliàozhīxiàntiáoqùchújípòsuìzìchùlǐ
_version_ 1718344141109198848