A Novel Dataset for English-Arabic Scene Text Recognition (EASTR)-42K and Its Evaluation Using Invariant Feature Extraction on Detected Extremal Regions

The recognition of text in natural scene images is a practical yet challenging task due to the large variations in backgrounds, textures, fonts, and illumination. English as a secondary language is extensively used in Gulf countries along with Arabic script. Therefore, this paper introduces English-...

Full description

Bibliographic Details
Main Authors:	Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Rubiyah Bte Yusof
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Cursive script invariant extremal MDLSTM
Online Access:	https://ieeexplore.ieee.org/document/8641268/

Description
Summary:	The recognition of text in natural scene images is a practical yet challenging task due to the large variations in backgrounds, textures, fonts, and illumination. English as a secondary language is extensively used in Gulf countries along with Arabic script. Therefore, this paper introduces English-Arabic scene text recognition 42K scene text image dataset. The dataset includes text images appeared in English and Arabic scripts while maintaining the prime focus on Arabic script. The dataset can be employed for the evaluation of text segmentation and recognition task. To provide an insight to other researchers, experiments have been carried out on the segmentation and classification of Arabic as well as English text and report error rates like 5.99% and 2.48%, respectively. This paper presents a novel technique by using adapted maximally stable extremal region (MSER) technique and extracts scale-invariant features from MSER detected region. To select discriminant and comprehensive features, the size of invariant features is restricted and considered those specific features which exist in the extremal region. The adapted MDLSTM network is presented to tackle the complexities of cursive scene text. The research on Arabic scene text is in its infancy, thus this paper presents benchmark work in the field of text analysis.
ISSN:	2169-3536

A Novel Dataset for English-Arabic Scene Text Recognition (EASTR)-42K and Its Evaluation Using Invariant Feature Extraction on Detected Extremal Regions

Similar Items