Summary: | YOLO v3 has poor accuracy in target location recognition, and the detection effect needs to be improved in complex scenes with dense target distribution and large size differences. To solve this problem, an improved multi-scale target detection algorithm based on feature fusion (FF-YOLO) is proposed in this paper. Firstly, the residual structure in Darknet53 backbone of YOLO v3 is replaced by the optimized dense connection network FCN-DenseNet, and features are extracted effectively through feature reuse, and the problem of vanishing gradient is further alleviated. Secondly, based on the three-scale prediction mechanism of YOLO v3, we added a fourth detection scale to make the network learn more shallower location information. Finally, Spatial Pyramid Pooling (SPP) module is added before each detection layer to make the local feature information deeply fused. It increases the receptive field of the backbone network and significantly isolates the most important contextual feature receptive. Experiments show that FF-YOLO can effectively improve the detection accuracy of multi-scale targets in complex scenes. On Pascal VOC2007 data set, the mAP of FF-YOLO is 5.8% higher than that of YOLO v3. At the same time, the mAP of FF-YOLO for medium and small targets are 1.5% and 2.2% higher than that of YOLO v3 on MS COCO data set.
|