Semantic Document Image Classification Based on Valuable Text Pattern

Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used...

Full description

Bibliographic Details
Main Authors: Hossein Pourghassem, Mohammad sadegh Helforoush, Sabalan Daneshvar
Format: Article
Language:English
Published: Najafabad Branch, Islamic Azad University 2011-01-01
Series:Journal of Intelligent Procedures in Electrical Technology
Subjects:
Online Access:http://jipet.iaun.ac.ir/pdf_4459_7ec474f7128023380ad44f248368d8d0.html
Description
Summary:Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region) regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.
ISSN:2322-3871
2345-5594