GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection
Scene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the m...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8709682/ |
id |
doaj-b34fca0f6a6e4983aa41dd9125208aff |
---|---|
record_format |
Article |
spelling |
doaj-b34fca0f6a6e4983aa41dd9125208aff2021-03-29T22:58:49ZengIEEEIEEE Access2169-35362019-01-017628056281610.1109/ACCESS.2019.29155138709682GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text DetectionMeng Cao0https://orcid.org/0000-0002-8946-4228Yuexian Zou1Dongming Yang2Chao Liu3ADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaADSPLAB, School of Electronic and Computer Engineering, Peking University, Shenzhen, ChinaScene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the mainstream approaches in multi-oriented text detection tasks. However, experience has shown that text-like objects in the complex background have high response values on the output feature map of U-Net, which leads to the severe false positive detection rate and degrades the STD performance. To tackle this issue, an adaptive soft attention mechanism called contextual attention module (CAM) is devised to integrate into U-Net to highlight salient areas and meanwhile retains more detail information. Besides, the gradient vanishing and exploding problems make U-Net more difficult to train because of the nonlinear deconvolution layer used in the up-sampling process. To facilitate the training process, a gradient-inductive module (GIM) is carefully designed to provide a linear bypass to make the gradient back-propagation process more stable. Accordingly, an end-to-end trainable Gradient-Inductive Segmentation network with Contextual Attention is proposed (GISCA). The experimental results on three public benchmarks have demonstrated that the proposed GISCA achieves the state-of-the-art results in terms of f-measure: 92.1%, 87.3%, and 81.4% for ICDAR 2013, ICDAR 2015, and MSRA TD500, respectively.https://ieeexplore.ieee.org/document/8709682/Scene text detectionmulti-oriented textsegmentation networkcontextual attentiongradient vanishing/exploding problems |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Meng Cao Yuexian Zou Dongming Yang Chao Liu |
spellingShingle |
Meng Cao Yuexian Zou Dongming Yang Chao Liu GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection IEEE Access Scene text detection multi-oriented text segmentation network contextual attention gradient vanishing/exploding problems |
author_facet |
Meng Cao Yuexian Zou Dongming Yang Chao Liu |
author_sort |
Meng Cao |
title |
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection |
title_short |
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection |
title_full |
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection |
title_fullStr |
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection |
title_full_unstemmed |
GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection |
title_sort |
gisca: gradient-inductive segmentation network with contextual attention for scene text detection |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Scene text detection (STD) is an irreplaceable step in a scene text reading system. It remains a more challenging task than general object detection since text objects are of arbitrary orientations and varying sizes. Generally, segmentation methods that use U-Net or hourglass-like networks are the mainstream approaches in multi-oriented text detection tasks. However, experience has shown that text-like objects in the complex background have high response values on the output feature map of U-Net, which leads to the severe false positive detection rate and degrades the STD performance. To tackle this issue, an adaptive soft attention mechanism called contextual attention module (CAM) is devised to integrate into U-Net to highlight salient areas and meanwhile retains more detail information. Besides, the gradient vanishing and exploding problems make U-Net more difficult to train because of the nonlinear deconvolution layer used in the up-sampling process. To facilitate the training process, a gradient-inductive module (GIM) is carefully designed to provide a linear bypass to make the gradient back-propagation process more stable. Accordingly, an end-to-end trainable Gradient-Inductive Segmentation network with Contextual Attention is proposed (GISCA). The experimental results on three public benchmarks have demonstrated that the proposed GISCA achieves the state-of-the-art results in terms of f-measure: 92.1%, 87.3%, and 81.4% for ICDAR 2013, ICDAR 2015, and MSRA TD500, respectively. |
topic |
Scene text detection multi-oriented text segmentation network contextual attention gradient vanishing/exploding problems |
url |
https://ieeexplore.ieee.org/document/8709682/ |
work_keys_str_mv |
AT mengcao giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection AT yuexianzou giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection AT dongmingyang giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection AT chaoliu giscagradientinductivesegmentationnetworkwithcontextualattentionforscenetextdetection |
_version_ |
1724190468739170304 |