Scale Adaptive Feature Pyramid Networks for 2D Object Detection
Object detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid w...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2020-01-01
|
Series: | Scientific Programming |
Online Access: | http://dx.doi.org/10.1155/2020/8839979 |
id |
doaj-c4c696471e7a4ec4a61b9e4ab6345b68 |
---|---|
record_format |
Article |
spelling |
doaj-c4c696471e7a4ec4a61b9e4ab6345b682021-07-02T13:13:40ZengHindawi LimitedScientific Programming1875-919X2020-01-01202010.1155/2020/8839979Scale Adaptive Feature Pyramid Networks for 2D Object DetectionLifei He0Ming Jiang1Ryutarou Ohbuchi2Takahiko Furuya3Min Zhang4Pengfei Li5School of Computer ScienceSchool of Computer ScienceDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringSchool of Computer ScienceSchool of Computer ScienceObject detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid with higher semantic content at every scale level. The FPN consists of a bottom-up pyramid and a top-down pyramid. The bottom-up pyramid is induced by a convolutional neural network as its layers of feature maps. The top-down pyramid is formed by progressive up-sampling of a highly semantic yet low-resolution feature map at the top of the bottom-up pyramid. At each up-sampling step, feature maps of the bottom-up pyramid are fused with the top-down pyramid to produce highly semantic yet high-resolution feature maps in the top-down pyramid. Despite significant improvement, the FPN still misses small-scale objects. To further improve the detection of small-scale objects, this paper proposes scale adaptive feature pyramid networks (SAFPNs). The SAFPN employs weights chosen adaptively to each input image in fusing feature maps of the bottom-up pyramid and top-down pyramid. Scale adaptive weights are computed by using a scale attention module built into the feature map fusion computation. The scale attention module is trained end-to-end to adapt to the scale of objects contained in images of the training dataset. Experimental evaluation, using both the 2-stage detector faster R-CNN and 1-stage detector RetinaNet, demonstrated the proposed approach’s effectiveness.http://dx.doi.org/10.1155/2020/8839979 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lifei He Ming Jiang Ryutarou Ohbuchi Takahiko Furuya Min Zhang Pengfei Li |
spellingShingle |
Lifei He Ming Jiang Ryutarou Ohbuchi Takahiko Furuya Min Zhang Pengfei Li Scale Adaptive Feature Pyramid Networks for 2D Object Detection Scientific Programming |
author_facet |
Lifei He Ming Jiang Ryutarou Ohbuchi Takahiko Furuya Min Zhang Pengfei Li |
author_sort |
Lifei He |
title |
Scale Adaptive Feature Pyramid Networks for 2D Object Detection |
title_short |
Scale Adaptive Feature Pyramid Networks for 2D Object Detection |
title_full |
Scale Adaptive Feature Pyramid Networks for 2D Object Detection |
title_fullStr |
Scale Adaptive Feature Pyramid Networks for 2D Object Detection |
title_full_unstemmed |
Scale Adaptive Feature Pyramid Networks for 2D Object Detection |
title_sort |
scale adaptive feature pyramid networks for 2d object detection |
publisher |
Hindawi Limited |
series |
Scientific Programming |
issn |
1875-919X |
publishDate |
2020-01-01 |
description |
Object detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid with higher semantic content at every scale level. The FPN consists of a bottom-up pyramid and a top-down pyramid. The bottom-up pyramid is induced by a convolutional neural network as its layers of feature maps. The top-down pyramid is formed by progressive up-sampling of a highly semantic yet low-resolution feature map at the top of the bottom-up pyramid. At each up-sampling step, feature maps of the bottom-up pyramid are fused with the top-down pyramid to produce highly semantic yet high-resolution feature maps in the top-down pyramid. Despite significant improvement, the FPN still misses small-scale objects. To further improve the detection of small-scale objects, this paper proposes scale adaptive feature pyramid networks (SAFPNs). The SAFPN employs weights chosen adaptively to each input image in fusing feature maps of the bottom-up pyramid and top-down pyramid. Scale adaptive weights are computed by using a scale attention module built into the feature map fusion computation. The scale attention module is trained end-to-end to adapt to the scale of objects contained in images of the training dataset. Experimental evaluation, using both the 2-stage detector faster R-CNN and 1-stage detector RetinaNet, demonstrated the proposed approach’s effectiveness. |
url |
http://dx.doi.org/10.1155/2020/8839979 |
work_keys_str_mv |
AT lifeihe scaleadaptivefeaturepyramidnetworksfor2dobjectdetection AT mingjiang scaleadaptivefeaturepyramidnetworksfor2dobjectdetection AT ryutarouohbuchi scaleadaptivefeaturepyramidnetworksfor2dobjectdetection AT takahikofuruya scaleadaptivefeaturepyramidnetworksfor2dobjectdetection AT minzhang scaleadaptivefeaturepyramidnetworksfor2dobjectdetection AT pengfeili scaleadaptivefeaturepyramidnetworksfor2dobjectdetection |
_version_ |
1721329155733192704 |