Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2020-07-01
|
Series: | International Journal of Advanced Robotic Systems |
Online Access: | https://doi.org/10.1177/1729881420936062 |
id |
doaj-b70e476021404aa6ac99541ae14a8354 |
---|---|
record_format |
Article |
spelling |
doaj-b70e476021404aa6ac99541ae14a83542020-11-25T04:03:35ZengSAGE PublishingInternational Journal of Advanced Robotic Systems1729-88142020-07-011710.1177/1729881420936062Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detectionXiaoguo ZhangYe GaoHuiqing WangQing WangEffectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.https://doi.org/10.1177/1729881420936062 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang |
spellingShingle |
Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection International Journal of Advanced Robotic Systems |
author_facet |
Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang |
author_sort |
Xiaoguo Zhang |
title |
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection |
title_short |
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection |
title_full |
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection |
title_fullStr |
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection |
title_full_unstemmed |
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection |
title_sort |
improve yolov3 using dilated spatial pyramid module for multi-scale object detection |
publisher |
SAGE Publishing |
series |
International Journal of Advanced Robotic Systems |
issn |
1729-8814 |
publishDate |
2020-07-01 |
description |
Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops. |
url |
https://doi.org/10.1177/1729881420936062 |
work_keys_str_mv |
AT xiaoguozhang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT yegao improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT huiqingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT qingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection |
_version_ |
1724439476546043904 |