A New Approach to Extract Text from Images based on DWT and K-means Clustering

Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex back...

Full description

Bibliographic Details
Main Authors: Deepika Ghai, Divya Gera, Neelu Jain
Format: Article
Language:English
Published: Atlantis Press 2016-09-01
Series:International Journal of Computational Intelligence Systems
Subjects:
DWT
Online Access:https://www.atlantis-press.com/article/25868737/view
id doaj-75b0e23482ff4e739b7d98599fd3ec57
record_format Article
spelling doaj-75b0e23482ff4e739b7d98599fd3ec572020-11-25T01:49:14ZengAtlantis PressInternational Journal of Computational Intelligence Systems 1875-68832016-09-019510.1080/18756891.2016.1237189A New Approach to Extract Text from Images based on DWT and K-means ClusteringDeepika GhaiDivya GeraNeelu JainText present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR).https://www.atlantis-press.com/article/25868737/viewText extractionTexture featuresDWTK-means clusteringsliding windowvoting decision
collection DOAJ
language English
format Article
sources DOAJ
author Deepika Ghai
Divya Gera
Neelu Jain
spellingShingle Deepika Ghai
Divya Gera
Neelu Jain
A New Approach to Extract Text from Images based on DWT and K-means Clustering
International Journal of Computational Intelligence Systems
Text extraction
Texture features
DWT
K-means clustering
sliding window
voting decision
author_facet Deepika Ghai
Divya Gera
Neelu Jain
author_sort Deepika Ghai
title A New Approach to Extract Text from Images based on DWT and K-means Clustering
title_short A New Approach to Extract Text from Images based on DWT and K-means Clustering
title_full A New Approach to Extract Text from Images based on DWT and K-means Clustering
title_fullStr A New Approach to Extract Text from Images based on DWT and K-means Clustering
title_full_unstemmed A New Approach to Extract Text from Images based on DWT and K-means Clustering
title_sort new approach to extract text from images based on dwt and k-means clustering
publisher Atlantis Press
series International Journal of Computational Intelligence Systems
issn 1875-6883
publishDate 2016-09-01
description Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR).
topic Text extraction
Texture features
DWT
K-means clustering
sliding window
voting decision
url https://www.atlantis-press.com/article/25868737/view
work_keys_str_mv AT deepikaghai anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering
AT divyagera anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering
AT neelujain anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering
AT deepikaghai newapproachtoextracttextfromimagesbasedondwtandkmeansclustering
AT divyagera newapproachtoextracttextfromimagesbasedondwtandkmeansclustering
AT neelujain newapproachtoextracttextfromimagesbasedondwtandkmeansclustering
_version_ 1725007937867350016