Extraction of Text Lines from Handwritten Documents Using Piecewise Water Flow Technique

A novel piecewise water flow technique for text line extraction from multi-skewed document images of handwritten text of different scripts is presented here. The basic water flow technique assumes that the hypothetical water flows from both left and right sides of the image frame. This flow of water...

Full description

Bibliographic Details
Main Authors: Sarkar Ram, Das Nibaran, Basu Subhadip, Kundu Mahantapas, Nasipuri Mita
Format: Article
Language:English
Published: De Gruyter 2014-09-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2013-0018
Description
Summary:A novel piecewise water flow technique for text line extraction from multi-skewed document images of handwritten text of different scripts is presented here. The basic water flow technique assumes that the hypothetical water flows from both left and right sides of the image frame. This flow of water fills up the gaps between consecutive objects (texts) but faces obstruction if any object lies in the path of the flow. All unwetted regions in the document image are then labeled distinctly to extract the text lines. However, the technique fails when two neighboring text lines touch each other, as water gets obstructed by the touching segment(s). To get rid of this difficulty, we have modified the basic water flow technique by iteratively applying the same over the vertically segmented document images. The main purpose of this vertical segmentation is to localize the text line segment(s) where two text lines get joined. These segments are then horizontally fragmented, and each fragment is placed suitably to the text line in which it actually belongs to. This way, the probable data loss during isolation of the touching text line segment is minimized. Both the techniques (current and basic ones) have been tested on three different databases, viz., CMATERdb 1.1.1, CMATERdb 1.1.2, and ICDAR2009 handwritten segmentation contest pages, respectively. The test results show that the present technique outperforms the basic one for all three databases.
ISSN:0334-1860
2191-026X