A novel local skew correction and segmentation approach for printed multilingual Indian documents
Till date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2018-09-01
|
Series: | Alexandria Engineering Journal |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1110016817302053 |
id |
doaj-ebec612493e940b998477d2aa24ec600 |
---|---|
record_format |
Article |
spelling |
doaj-ebec612493e940b998477d2aa24ec6002021-06-02T06:21:25ZengElsevierAlexandria Engineering Journal1110-01682018-09-0157316091618A novel local skew correction and segmentation approach for printed multilingual Indian documentsNarasimha Reddy Soora0Parag S. Deshpande1Visvesvaraya National Institute of Technology, Nagpur 440010, India; Corresponding author. Fax: +91 712 2223969.Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur 440010, IndiaTill date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be searched for the desired content. Manual search of such a huge number of scanned Indian documents will be tedious, which requires robust automatic searching software. This led us to work toward indexing of aged printed multilingual Indian office documents. This paper presents a novel geometrical technique to group the components which belong to a text line of a document having multi-orientations and a novel approach to find the local skew of Devanagari word. The performance of the proposed technique was evaluated using 280 printed Indian documents with around 6000 text lines having English, Devanagari, and Marathi scripts and achieved 99% success rate for line segmentation indicates the legitimacy of the proposed method. To further assess the performance of the proposed method, we have considered publicly available Tobacco800 document image database and achieved significant performance results as compared with few of the prominent methods from the literature. Keywords: Character recognition, Character segmentation, Document analysis, Skew correction, Rough skew, Text line extraction, Word segmentationhttp://www.sciencedirect.com/science/article/pii/S1110016817302053 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Narasimha Reddy Soora Parag S. Deshpande |
spellingShingle |
Narasimha Reddy Soora Parag S. Deshpande A novel local skew correction and segmentation approach for printed multilingual Indian documents Alexandria Engineering Journal |
author_facet |
Narasimha Reddy Soora Parag S. Deshpande |
author_sort |
Narasimha Reddy Soora |
title |
A novel local skew correction and segmentation approach for printed multilingual Indian documents |
title_short |
A novel local skew correction and segmentation approach for printed multilingual Indian documents |
title_full |
A novel local skew correction and segmentation approach for printed multilingual Indian documents |
title_fullStr |
A novel local skew correction and segmentation approach for printed multilingual Indian documents |
title_full_unstemmed |
A novel local skew correction and segmentation approach for printed multilingual Indian documents |
title_sort |
novel local skew correction and segmentation approach for printed multilingual indian documents |
publisher |
Elsevier |
series |
Alexandria Engineering Journal |
issn |
1110-0168 |
publishDate |
2018-09-01 |
description |
Till date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be searched for the desired content. Manual search of such a huge number of scanned Indian documents will be tedious, which requires robust automatic searching software. This led us to work toward indexing of aged printed multilingual Indian office documents. This paper presents a novel geometrical technique to group the components which belong to a text line of a document having multi-orientations and a novel approach to find the local skew of Devanagari word. The performance of the proposed technique was evaluated using 280 printed Indian documents with around 6000 text lines having English, Devanagari, and Marathi scripts and achieved 99% success rate for line segmentation indicates the legitimacy of the proposed method. To further assess the performance of the proposed method, we have considered publicly available Tobacco800 document image database and achieved significant performance results as compared with few of the prominent methods from the literature. Keywords: Character recognition, Character segmentation, Document analysis, Skew correction, Rough skew, Text line extraction, Word segmentation |
url |
http://www.sciencedirect.com/science/article/pii/S1110016817302053 |
work_keys_str_mv |
AT narasimhareddysoora anovellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT paragsdeshpande anovellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT narasimhareddysoora novellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT paragsdeshpande novellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments |
_version_ |
1721407769256394752 |