Semantic Document Image Classification Based on Valuable Text Pattern
Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Najafabad Branch, Islamic Azad University
2011-01-01
|
Series: | Journal of Intelligent Procedures in Electrical Technology |
Subjects: | |
Online Access: | http://jipet.iaun.ac.ir/pdf_4459_7ec474f7128023380ad44f248368d8d0.html |
Summary: | Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region) regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem. |
---|---|
ISSN: | 2322-3871 2345-5594 |