Object Detection and Classification Using RetinaNet with Clustered Anchors and Multi-Subnets

碩士 === 國立成功大學 === 資訊工程學系 === 107 === Object detection and classification has been a popular issue in Computer Vision area for a long time. There are many deep learning based methods proposed in recent years. Among all the methods, RetinaNet has the best trade-off between detection precision and spee...

Full description

Bibliographic Details
Main Authors: Sin-CihLiu, 劉心慈
Other Authors: Jenn-Jier James Lien
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/m56sgv
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系 === 107 === Object detection and classification has been a popular issue in Computer Vision area for a long time. There are many deep learning based methods proposed in recent years. Among all the methods, RetinaNet has the best trade-off between detection precision and speed. It proposed a novel loss function called Focal Loss, which forces the network to focus on hard training examples. In this research, the emphasis is put on the network architecture itself instead. Three modifications are proposed to improve the network architecture in order to achieve higher average precision and average recall. First, the Feature Pyramid Network is modified. The purpose of Feature Pyramid Network is to produce multi-scale feature maps in network without explicitly resizing the input image. The multi-scale feature maps can later be used to detect objects with different scales. The element-wise adding operation inside the original Feature Pyramid Network is replaced by concatenation in this research. This design better maintains the feature information from both sets of feature maps. Second, k-means algorithm is applied to all the ground-truth bounding boxes to cluster their widths and heights. The ratios of the widths and heights of the cluster centers will be used as the anchor ratios, which is called “clustered anchors” in this work. Such design makes the anchors better suit the dataset. Finally, the multi-subnets are attached to different level feature maps to detect and classify objects with different scales. According to the experimental results, the three modifications mentioned above can truly improve the average precision and average recall. Besides, this research also shows a real application on tool defect detection. A patch-based method is adopted. The entire stitching image of tool will first be divided into patches and the detection and classification will be done on each patch respectively. Afterwards, the results of each patch will be combined to form the final detection results.