A novel local skew correction and segmentation approach for printed multilingual Indian documents

Till date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be...

Full description

Bibliographic Details
Main Authors:	Narasimha Reddy Soora, Parag S. Deshpande
Format:	Article
Language:	English
Published:	Elsevier 2018-09-01
Series:	Alexandria Engineering Journal
Online Access:	http://www.sciencedirect.com/science/article/pii/S1110016817302053

id	doaj-ebec612493e940b998477d2aa24ec600
record_format	Article
spelling	doaj-ebec612493e940b998477d2aa24ec6002021-06-02T06:21:25ZengElsevierAlexandria Engineering Journal1110-01682018-09-0157316091618A novel local skew correction and segmentation approach for printed multilingual Indian documentsNarasimha Reddy Soora0Parag S. Deshpande1Visvesvaraya National Institute of Technology, Nagpur 440010, India; Corresponding author. Fax: +91 712 2223969.Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur 440010, IndiaTill date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be searched for the desired content. Manual search of such a huge number of scanned Indian documents will be tedious, which requires robust automatic searching software. This led us to work toward indexing of aged printed multilingual Indian office documents. This paper presents a novel geometrical technique to group the components which belong to a text line of a document having multi-orientations and a novel approach to find the local skew of Devanagari word. The performance of the proposed technique was evaluated using 280 printed Indian documents with around 6000 text lines having English, Devanagari, and Marathi scripts and achieved 99% success rate for line segmentation indicates the legitimacy of the proposed method. To further assess the performance of the proposed method, we have considered publicly available Tobacco800 document image database and achieved significant performance results as compared with few of the prominent methods from the literature. Keywords: Character recognition, Character segmentation, Document analysis, Skew correction, Rough skew, Text line extraction, Word segmentationhttp://www.sciencedirect.com/science/article/pii/S1110016817302053
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Narasimha Reddy Soora Parag S. Deshpande
spellingShingle	Narasimha Reddy Soora Parag S. Deshpande A novel local skew correction and segmentation approach for printed multilingual Indian documents Alexandria Engineering Journal
author_facet	Narasimha Reddy Soora Parag S. Deshpande
author_sort	Narasimha Reddy Soora
title	A novel local skew correction and segmentation approach for printed multilingual Indian documents
title_short	A novel local skew correction and segmentation approach for printed multilingual Indian documents
title_full	A novel local skew correction and segmentation approach for printed multilingual Indian documents
title_fullStr	A novel local skew correction and segmentation approach for printed multilingual Indian documents
title_full_unstemmed	A novel local skew correction and segmentation approach for printed multilingual Indian documents
title_sort	novel local skew correction and segmentation approach for printed multilingual indian documents
publisher	Elsevier
series	Alexandria Engineering Journal
issn	1110-0168
publishDate	2018-09-01
description	Till date, many Indian government organizations do not have robust software to search for words from scanned office documents having complex multilingual Indian scripts. Manual search of such a multilingual Indian document will take few minutes and there will be tens of thousands of documents to be searched for the desired content. Manual search of such a huge number of scanned Indian documents will be tedious, which requires robust automatic searching software. This led us to work toward indexing of aged printed multilingual Indian office documents. This paper presents a novel geometrical technique to group the components which belong to a text line of a document having multi-orientations and a novel approach to find the local skew of Devanagari word. The performance of the proposed technique was evaluated using 280 printed Indian documents with around 6000 text lines having English, Devanagari, and Marathi scripts and achieved 99% success rate for line segmentation indicates the legitimacy of the proposed method. To further assess the performance of the proposed method, we have considered publicly available Tobacco800 document image database and achieved significant performance results as compared with few of the prominent methods from the literature. Keywords: Character recognition, Character segmentation, Document analysis, Skew correction, Rough skew, Text line extraction, Word segmentation
url	http://www.sciencedirect.com/science/article/pii/S1110016817302053
work_keys_str_mv	AT narasimhareddysoora anovellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT paragsdeshpande anovellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT narasimhareddysoora novellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments AT paragsdeshpande novellocalskewcorrectionandsegmentationapproachforprintedmultilingualindiandocuments
_version_	1721407769256394752

A novel local skew correction and segmentation approach for printed multilingual Indian documents

Similar Items