Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection

Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; h...

Full description

Bibliographic Details
Main Authors:	Xiaoguo Zhang, Ye Gao, Huiqing Wang, Qing Wang
Format:	Article
Language:	English
Published:	SAGE Publishing 2020-07-01
Series:	International Journal of Advanced Robotic Systems
Online Access:	https://doi.org/10.1177/1729881420936062

id	doaj-b70e476021404aa6ac99541ae14a8354
record_format	Article
spelling	doaj-b70e476021404aa6ac99541ae14a83542020-11-25T04:03:35ZengSAGE PublishingInternational Journal of Advanced Robotic Systems1729-88142020-07-011710.1177/1729881420936062Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detectionXiaoguo ZhangYe GaoHuiqing WangQing WangEffectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.https://doi.org/10.1177/1729881420936062
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang
spellingShingle	Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection International Journal of Advanced Robotic Systems
author_facet	Xiaoguo Zhang Ye Gao Huiqing Wang Qing Wang
author_sort	Xiaoguo Zhang
title	Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_short	Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_full	Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_fullStr	Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_full_unstemmed	Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection
title_sort	improve yolov3 using dilated spatial pyramid module for multi-scale object detection
publisher	SAGE Publishing
series	International Journal of Advanced Robotic Systems
issn	1729-8814
publishDate	2020-07-01
description	Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.
url	https://doi.org/10.1177/1729881420936062
work_keys_str_mv	AT xiaoguozhang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT yegao improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT huiqingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection AT qingwang improveyolov3usingdilatedspatialpyramidmoduleformultiscaleobjectdetection
_version_	1724439476546043904

Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection

Similar Items