Automatic Form Classification by Feature Graph Matching

碩士 === 國立中央大學 === 資訊及電子工程研究所 === 83 === Form classification plays an important role in automatic form processing system. This is due to the fact that its final recognized result will indeed influence the processing capability of the followi...

Full description

Bibliographic Details
Main Authors: Lu Jeng Ming, 盧鎮明
Other Authors: Fan Kuo Chin
Format: Others
Language:en_US
Published: 1995
Online Access:http://ndltd.ncl.edu.tw/handle/51420610776065632181
Description
Summary:碩士 === 國立中央大學 === 資訊及電子工程研究所 === 83 === Form classification plays an important role in automatic form processing system. This is due to the fact that its final recognized result will indeed influence the processing capability of the following modules. Hences, current form processing systems are all dedicated to the developing of a robust form classification algorithm which can accurately and quickly recognize unknown forms. In this thesis, we will present a novel method for recognizing forms. Generally speaking, we treat the form classification problem as a graph matching problem. The rationale of our proposed approach is based on the clustering of all feature points extracted from forms by estimating the Euclidean distance of each pair of feature points. It is obvious that each cluster possesses a cluster center. In our approach, we treat each cluster center as a node in form graph representation and the distance of each pair of cluster centers as an edge linking the two corresponding nodes. By applying the same procedure to each form image, we can obtain its corresponding graph representation. Additionally, the separation of characters from form documents is also an important task in form classification process. Two algorithms for extracting characters from form documents which are suitable for different quality of input form images are presented in this thesis. In the first algorithm, the character extraction problem is regarded as a pattern clustering problem. Using this algorithm, all characters can be separated from form documents completely independent of the size, orientation, and location of characters even if the character touching/overlapping line problems occur. In the second algorithm, its basic concept is to use the geometric variance of circumscribing rectangles between characters and structured line patterns. The feasibility of our proposed methods are demonstrated by experimenting various kinds of forms. Experimental results reveal the feasibility of the proposed approach.