Summary: | 碩士 === 國立中央大學 === 資訊及電子工程研究所 === 83 === Form classification plays an important role in automatic form
processing system. This is due to the fact that its final
recognized result will indeed influence the processing
capability of the following modules. Hences, current form
processing systems are all dedicated to the developing of a
robust form classification algorithm which can accurately and
quickly recognize unknown forms. In this thesis, we will
present a novel method for recognizing forms. Generally
speaking, we treat the form classification problem as a graph
matching problem. The rationale of our proposed approach is
based on the clustering of all feature points extracted from
forms by estimating the Euclidean distance of each pair of
feature points. It is obvious that each cluster possesses a
cluster center. In our approach, we treat each cluster center
as a node in form graph representation and the distance of each
pair of cluster centers as an edge linking the two
corresponding nodes. By applying the same procedure to each
form image, we can obtain its corresponding graph
representation. Additionally, the separation of characters from
form documents is also an important task in form classification
process. Two algorithms for extracting characters from form
documents which are suitable for different quality of input
form images are presented in this thesis. In the first
algorithm, the character extraction problem is regarded as a
pattern clustering problem. Using this algorithm, all
characters can be separated from form documents completely
independent of the size, orientation, and location of
characters even if the character touching/overlapping line
problems occur. In the second algorithm, its basic concept is
to use the geometric variance of circumscribing rectangles
between characters and structured line patterns. The
feasibility of our proposed methods are demonstrated by
experimenting various kinds of forms. Experimental results
reveal the feasibility of the proposed approach.
|