GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection

Scene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the m...

Full description

Bibliographic Details
Main Authors: Meng Cao, Yuexian Zou, Dongming Yang, Chao Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8709682/
id doaj-b34fca0f6a6e4983aa41dd9125208aff
record_format Article
spelling doaj-b34fca0f6a6e4983aa41dd9125208aff2021-03-29T22:58:49ZengIEEEIEEE Access2169-35362019-01-017628056281610.1109/ACCESS.2019.29155138709682GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text DetectionMeng Cao0https://orcid.org/0000-0002-8946-4228Yuexian Zou1Dongming Yang2Chao Liu3ADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaScene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the mainstream approaches in multi-oriented text detection tasks. However, experience has shown that text-like objects in the complex background have high response values on the output feature map of U-Net, which leads to the severe false positive detection rate and degrades the STD performance. To tackle this issue, an adaptive soft attention mechanism called contextual attention module (CAM) is devised to integrate into U-Net to highlight salient areas and meanwhile retains more detail information. Besides, the gradient vanishing and exploding problems make U-Net more difficult to train because of the nonlinear deconvolution layer used in the up-sampling process. To facilitate the training process, a gradient-inductive module (GIM) is carefully designed to provide a linear bypass to make the gradient back-propagation process more stable. Accordingly, an end-to-end trainable Gradient-Inductive Segmentation network with Contextual Attention is proposed (GISCA). The experimental results on three public benchmarks have demonstrated that the proposed GISCA achieves the state-of-the-art results in terms of f-measure: 92.1%, 87.3%, and 81.4% for ICDAR 2013, ICDAR 2015, and MSRA TD500, respectively.https://ieeexplore.ieee.org/document/8709682/Scene text detectionmulti-oriented textsegmentation networkcontextual attentiongradient vanishing/exploding problems
collection DOAJ
language English
format Article
sources DOAJ
author Meng Cao
Yuexian Zou
Dongming Yang
Chao Liu
spellingShingle Meng Cao
Yuexian Zou
Dongming Yang
Chao Liu
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
IEEE Access
Scene text detection
multi-oriented text
segmentation network
contextual attention
gradient vanishing/exploding problems
author_facet Meng Cao
Yuexian Zou
Dongming Yang
Chao Liu
author_sort Meng Cao
title GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
title_short GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
title_full GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
title_fullStr GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
title_full_unstemmed GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
title_sort gisca: gradient-inductive segmentation network with contextual attention for scene text detection
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Scene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the mainstream approaches in multi-oriented text detection tasks. However, experience has shown that text-like objects in the complex background have high response values on the output feature map of U-Net, which leads to the severe false positive detection rate and degrades the STD performance. To tackle this issue, an adaptive soft attention mechanism called contextual attention module (CAM) is devised to integrate into U-Net to highlight salient areas and meanwhile retains more detail information. Besides, the gradient vanishing and exploding problems make U-Net more difficult to train because of the nonlinear deconvolution layer used in the up-sampling process. To facilitate the training process, a gradient-inductive module (GIM) is carefully designed to provide a linear bypass to make the gradient back-propagation process more stable. Accordingly, an end-to-end trainable Gradient-Inductive Segmentation network with Contextual Attention is proposed (GISCA). The experimental results on three public benchmarks have demonstrated that the proposed GISCA achieves the state-of-the-art results in terms of f-measure: 92.1%, 87.3%, and 81.4% for ICDAR 2013, ICDAR 2015, and MSRA TD500, respectively.
topic Scene text detection
multi-oriented text
segmentation network
contextual attention
gradient vanishing/exploding problems
url https://ieeexplore.ieee.org/document/8709682/
work_keys_str_mv AT mengcao giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection
AT yuexianzou giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection
AT dongmingyang giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection
AT chaoliu giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection
_version_ 1724190468739170304