A New Approach to Extract Text from Images based on DWT and K-means Clustering
Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex back...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Atlantis Press
2016-09-01
|
Series: | International Journal of Computational Intelligence Systems |
Subjects: | |
Online Access: | https://www.atlantis-press.com/article/25868737/view |
id |
doaj-75b0e23482ff4e739b7d98599fd3ec57 |
---|---|
record_format |
Article |
spelling |
doaj-75b0e23482ff4e739b7d98599fd3ec572020-11-25T01:49:14ZengAtlantis PressInternational Journal of Computational Intelligence Systems 1875-68832016-09-019510.1080/18756891.2016.1237189A New Approach to Extract Text from Images based on DWT and K-means ClusteringDeepika GhaiDivya GeraNeelu JainText present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR).https://www.atlantis-press.com/article/25868737/viewText extractionTexture featuresDWTK-means clusteringsliding windowvoting decision |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Deepika Ghai Divya Gera Neelu Jain |
spellingShingle |
Deepika Ghai Divya Gera Neelu Jain A New Approach to Extract Text from Images based on DWT and K-means Clustering International Journal of Computational Intelligence Systems Text extraction Texture features DWT K-means clustering sliding window voting decision |
author_facet |
Deepika Ghai Divya Gera Neelu Jain |
author_sort |
Deepika Ghai |
title |
A New Approach to Extract Text from Images based on DWT and K-means Clustering |
title_short |
A New Approach to Extract Text from Images based on DWT and K-means Clustering |
title_full |
A New Approach to Extract Text from Images based on DWT and K-means Clustering |
title_fullStr |
A New Approach to Extract Text from Images based on DWT and K-means Clustering |
title_full_unstemmed |
A New Approach to Extract Text from Images based on DWT and K-means Clustering |
title_sort |
new approach to extract text from images based on dwt and k-means clustering |
publisher |
Atlantis Press |
series |
International Journal of Computational Intelligence Systems |
issn |
1875-6883 |
publishDate |
2016-09-01 |
description |
Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR). |
topic |
Text extraction Texture features DWT K-means clustering sliding window voting decision |
url |
https://www.atlantis-press.com/article/25868737/view |
work_keys_str_mv |
AT deepikaghai anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering AT divyagera anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering AT neelujain anewapproachtoextracttextfromimagesbasedondwtandkmeansclustering AT deepikaghai newapproachtoextracttextfromimagesbasedondwtandkmeansclustering AT divyagera newapproachtoextracttextfromimagesbasedondwtandkmeansclustering AT neelujain newapproachtoextracttextfromimagesbasedondwtandkmeansclustering |
_version_ |
1725007937867350016 |