Semantic Document Image Classification Based on Valuable Text Pattern

Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used...

Full description

Bibliographic Details
Main Authors:	Hossein Pourghassem, Mohammad sadegh Helforoush, Sabalan Daneshvar
Format:	Article
Language:	English
Published:	Najafabad Branch, Islamic Azad University 2011-01-01
Series:	Journal of Intelligent Procedures in Electrical Technology
Subjects:	semantic classification document and non-document images information valuable
Online Access:	http://jipet.iaun.ac.ir/pdf_4459_7ec474f7128023380ad44f248368d8d0.html

Description
Summary:	Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region) regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.
ISSN:	2322-3871 2345-5594

Semantic Document Image Classification Based on Valuable Text Pattern

Similar Items