Scale Adaptive Feature Pyramid Networks for 2D Object Detection

Object detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid w...

Full description

Bibliographic Details
Main Authors: Lifei He, Ming Jiang, Ryutarou Ohbuchi, Takahiko Furuya, Min Zhang, Pengfei Li
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2020/8839979
id doaj-c4c696471e7a4ec4a61b9e4ab6345b68
record_format Article
spelling doaj-c4c696471e7a4ec4a61b9e4ab6345b682021-07-02T13:13:40ZengHindawi LimitedScientific Programming1875-919X2020-01-01202010.1155/2020/8839979Scale Adaptive Feature Pyramid Networks for 2D Object DetectionLifei He0Ming Jiang1Ryutarou Ohbuchi2Takahiko Furuya3Min Zhang4Pengfei Li5School of Computer ScienceSchool of Computer ScienceDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringSchool of Computer ScienceSchool of Computer ScienceObject detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid with higher semantic content at every scale level. The FPN consists of a bottom-up pyramid and a top-down pyramid. The bottom-up pyramid is induced by a convolutional neural network as its layers of feature maps. The top-down pyramid is formed by progressive up-sampling of a highly semantic yet low-resolution feature map at the top of the bottom-up pyramid. At each up-sampling step, feature maps of the bottom-up pyramid are fused with the top-down pyramid to produce highly semantic yet high-resolution feature maps in the top-down pyramid. Despite significant improvement, the FPN still misses small-scale objects. To further improve the detection of small-scale objects, this paper proposes scale adaptive feature pyramid networks (SAFPNs). The SAFPN employs weights chosen adaptively to each input image in fusing feature maps of the bottom-up pyramid and top-down pyramid. Scale adaptive weights are computed by using a scale attention module built into the feature map fusion computation. The scale attention module is trained end-to-end to adapt to the scale of objects contained in images of the training dataset. Experimental evaluation, using both the 2-stage detector faster R-CNN and 1-stage detector RetinaNet, demonstrated the proposed approach’s effectiveness.http://dx.doi.org/10.1155/2020/8839979
collection DOAJ
language English
format Article
sources DOAJ
author Lifei He
Ming Jiang
Ryutarou Ohbuchi
Takahiko Furuya
Min Zhang
Pengfei Li
spellingShingle Lifei He
Ming Jiang
Ryutarou Ohbuchi
Takahiko Furuya
Min Zhang
Pengfei Li
Scale Adaptive Feature Pyramid Networks for 2D Object Detection
Scientific Programming
author_facet Lifei He
Ming Jiang
Ryutarou Ohbuchi
Takahiko Furuya
Min Zhang
Pengfei Li
author_sort Lifei He
title Scale Adaptive Feature Pyramid Networks for 2D Object Detection
title_short Scale Adaptive Feature Pyramid Networks for 2D Object Detection
title_full Scale Adaptive Feature Pyramid Networks for 2D Object Detection
title_fullStr Scale Adaptive Feature Pyramid Networks for 2D Object Detection
title_full_unstemmed Scale Adaptive Feature Pyramid Networks for 2D Object Detection
title_sort scale adaptive feature pyramid networks for 2d object detection
publisher Hindawi Limited
series Scientific Programming
issn 1875-919X
publishDate 2020-01-01
description Object detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid with higher semantic content at every scale level. The FPN consists of a bottom-up pyramid and a top-down pyramid. The bottom-up pyramid is induced by a convolutional neural network as its layers of feature maps. The top-down pyramid is formed by progressive up-sampling of a highly semantic yet low-resolution feature map at the top of the bottom-up pyramid. At each up-sampling step, feature maps of the bottom-up pyramid are fused with the top-down pyramid to produce highly semantic yet high-resolution feature maps in the top-down pyramid. Despite significant improvement, the FPN still misses small-scale objects. To further improve the detection of small-scale objects, this paper proposes scale adaptive feature pyramid networks (SAFPNs). The SAFPN employs weights chosen adaptively to each input image in fusing feature maps of the bottom-up pyramid and top-down pyramid. Scale adaptive weights are computed by using a scale attention module built into the feature map fusion computation. The scale attention module is trained end-to-end to adapt to the scale of objects contained in images of the training dataset. Experimental evaluation, using both the 2-stage detector faster R-CNN and 1-stage detector RetinaNet, demonstrated the proposed approach’s effectiveness.
url http://dx.doi.org/10.1155/2020/8839979
work_keys_str_mv AT lifeihe scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
AT mingjiang scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
AT ryutarouohbuchi scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
AT takahikofuruya scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
AT minzhang scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
AT pengfeili scaleadaptivefeaturepyramidnetworksfor2dobjectdetection
_version_ 1721329155733192704