Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection

Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; h...

Full description

Bibliographic Details
Main Authors: Xiaoguo Zhang, Ye Gao, Huiqing Wang, Qing Wang
Format: Article
Language:English
Published: SAGE Publishing 2020-07-01
Series:International Journal of Advanced Robotic Systems
Online Access:https://doi.org/10.1177/1729881420936062
id doaj-b70e476021404aa6ac99541ae14a8354
record_format Article
spelling doaj-b70e476021404aa6ac99541ae14a83542020-11-25T04:03:35ZengSAGE PublishingInternational Journal of Advanced Robotic Systems1729-88142020-07-011710.1177/1729881420936062Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detectionXiaoguo ZhangYe GaoHuiqing WangQing WangEffectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.https://doi.org/10.1177/1729881420936062
collection DOAJ
language English
format Article
sources DOAJ
author Xiaoguo Zhang
Ye Gao
Huiqing Wang
Qing Wang
spellingShingle Xiaoguo Zhang
Ye Gao
Huiqing Wang
Qing Wang
Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
International Journal of Advanced Robotic Systems
author_facet Xiaoguo Zhang
Ye Gao
Huiqing Wang
Qing Wang
author_sort Xiaoguo Zhang
title Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_short Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_full Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_fullStr Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_full_unstemmed Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_sort improve yolov3 using dilated spatial pyramid module for multi-scale object detection
publisher SAGE Publishing
series International Journal of Advanced Robotic Systems
issn 1729-8814
publishDate 2020-07-01
description Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.
url https://doi.org/10.1177/1729881420936062
work_keys_str_mv AT xiaoguozhang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection
AT yegao improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection
AT huiqingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection
AT qingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection
_version_ 1724439476546043904