Single-Shot Object Detection with Split and Combine Blocks

Feature fusion is widely used in various neural network-based visual recognition tasks, such as object detection, to enhance the quality of feature representation. It is common practice for both the one-stage object detectors and the two-stage object detectors to implement feature fusion in feature...

Full description

Bibliographic Details
Main Authors: Hongwei Wang, Dahua Li, Yu Song, Qiang Gao, Zhaoyang Wang, Chunping Liu
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/18/6382
Description
Summary:Feature fusion is widely used in various neural network-based visual recognition tasks, such as object detection, to enhance the quality of feature representation. It is common practice for both the one-stage object detectors and the two-stage object detectors to implement feature fusion in feature pyramid networks (FPN) to enhance the capacity to detect objects of different scales. In this work, we propose a novel and efficient feature fusion unit, which is referred to as the Split and Combine (SC) Block, that splits the input feature maps into several parts, then processes these sub-feature maps with different emphasis, and finally gradually concatenates the outputs one-by-one. The SC block implicitly encourages the network to focus on features that are more important to the task, thus improving network efficiency and reducing inference computations. In order to prove our analysis and conclusions, a backbone network and an FPN employing this technique are assembled into a one-stage detector and evaluated on the MS COCO dataset. With the newly introduced SC block and other novel training tricks, our detector achieves a good speed-accuracy trade-off on COCO test-dev set, with 37.1% AP (average precision) at 51 FPS and 38.9% AP at 40 FPS.
ISSN:2076-3417