Line Removal and Broken Characters Recovery of Clinic Data
碩士 === 國立中央大學 === 資訊工程研究所 === 87 === There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame l...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
1999
|
Online Access: | http://ndltd.ncl.edu.tw/handle/74160709171910677975 |
id |
ndltd-TW-087NCU00392018 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-087NCU003920182016-07-11T04:13:52Z http://ndltd.ncl.edu.tw/handle/74160709171910677975 Line Removal and Broken Characters Recovery of Clinic Data 病歷資料之線條去除及破碎字處理 Jia Yo Xu 許嘉佑 碩士 國立中央大學 資訊工程研究所 87 There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame lines, it will degenerate the recognition rate of OCR system seriously. Hence, how to remove frame lines without breaking those characters is an important issue of a form processing system. In the preprocessing stage, we want to translate the source gray-levels image into an ideal bi-levels image. In our system, Otsu binarization method and Gatos skew detection method are adopted as the binarization module and the skew detection module. Finally, the inverse operation of traditional image rotation transform is used as our skew adjustment module. In line detection and line removal stage, the horizontal projection of black points and the gradient of projection are used to determine the position of lines. After line removal, some survival pieces of lines might exist. We will remove these survival pieces by two masks to avoid influencing the result of broken character recovery. In broken character recovery stage, the linking relation information is collected by directional black run continuity. After collection the linking relation information, the filled-back method is determined by the ratio of the widths of two linking runs. Finally, postprocessing which is based on contour vectors fixes some defects of DBRC. In the experiments, 30 clinic data images and 12 notebook images were tested. The experimental recovery rate reveals the efficiency of our proposed method. Kuo Chin Fan 范國清 1999 學位論文 ; thesis 62 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 資訊工程研究所 === 87 === There are massive traditional type of forms still being used currently. How to capture data from forms automatically is thereby an important goal to be pursued. Since meaningful filled-in data of a form will have high possibility to contact the frame lines, it will degenerate the recognition rate of OCR system seriously. Hence, how to remove frame lines without breaking those characters is an important issue of a form processing system.
In the preprocessing stage, we want to translate the source gray-levels image into an ideal bi-levels image. In our system, Otsu binarization method and Gatos skew detection method are adopted as the binarization module and the skew detection module. Finally, the inverse operation of traditional image rotation transform is used as our skew adjustment module.
In line detection and line removal stage, the horizontal projection of black points and the gradient of projection are used to determine the position of lines. After line removal, some survival pieces of lines might exist. We will remove these survival pieces by two masks to avoid influencing the result of broken character recovery.
In broken character recovery stage, the linking relation information is collected by directional black run continuity. After collection the linking relation information, the filled-back method is determined by the ratio of the widths of two linking runs. Finally, postprocessing which is based on contour vectors fixes some defects of DBRC.
In the experiments, 30 clinic data images and 12 notebook images were tested. The experimental recovery rate reveals the efficiency of our proposed method.
|
author2 |
Kuo Chin Fan |
author_facet |
Kuo Chin Fan Jia Yo Xu 許嘉佑 |
author |
Jia Yo Xu 許嘉佑 |
spellingShingle |
Jia Yo Xu 許嘉佑 Line Removal and Broken Characters Recovery of Clinic Data |
author_sort |
Jia Yo Xu |
title |
Line Removal and Broken Characters Recovery of Clinic Data |
title_short |
Line Removal and Broken Characters Recovery of Clinic Data |
title_full |
Line Removal and Broken Characters Recovery of Clinic Data |
title_fullStr |
Line Removal and Broken Characters Recovery of Clinic Data |
title_full_unstemmed |
Line Removal and Broken Characters Recovery of Clinic Data |
title_sort |
line removal and broken characters recovery of clinic data |
publishDate |
1999 |
url |
http://ndltd.ncl.edu.tw/handle/74160709171910677975 |
work_keys_str_mv |
AT jiayoxu lineremovalandbrokencharactersrecoveryofclinicdata AT xǔjiāyòu lineremovalandbrokencharactersrecoveryofclinicdata AT jiayoxu bìnglìzīliàozhīxiàntiáoqùchújípòsuìzìchùlǐ AT xǔjiāyòu bìnglìzīliàozhīxiàntiáoqùchújípòsuìzìchùlǐ |
_version_ |
1718344141109198848 |