Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network

碩士 === 國立清華大學 === 資訊工程學系所 === 107 === Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep lea...

Full description

Bibliographic Details
Main Authors: Shih, Kuan-Hung, 施冠宏
Other Authors: Chiu, Ching-Te
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/d7d2x3
id ndltd-TW-107NTHU5392011
record_format oai_dc
spelling ndltd-TW-107NTHU53920112019-09-19T03:30:12Z http://ndltd.ncl.edu.tw/handle/d7d2x3 Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network 基於剪枝及多特徵連接輔助區域候選網路之即時物件偵測 Shih, Kuan-Hung 施冠宏 碩士 國立清華大學 資訊工程學系所 107 Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two- stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy. First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from dif- ferent convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accu- rately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset. The results show that we can compress ZF-Net from 227 MB to 45 MB and save 66% of computation. We can also compress VGG16 from 523 MB to 144 MB and save 77% of computation. Consequently, the inference speed is 40 FPS for ZF-Net and 27 FPS for VGG16. With the significant compression rates, the accuracies are 60.2% mean average precision (mAP) and 69.1% mAP for ZF-Net and VGG16, respectively. Chiu, Ching-Te 邱瀞德 2018 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊工程學系所 === 107 === Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two- stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy. First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from dif- ferent convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accu- rately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset. The results show that we can compress ZF-Net from 227 MB to 45 MB and save 66% of computation. We can also compress VGG16 from 523 MB to 144 MB and save 77% of computation. Consequently, the inference speed is 40 FPS for ZF-Net and 27 FPS for VGG16. With the significant compression rates, the accuracies are 60.2% mean average precision (mAP) and 69.1% mAP for ZF-Net and VGG16, respectively.
author2 Chiu, Ching-Te
author_facet Chiu, Ching-Te
Shih, Kuan-Hung
施冠宏
author Shih, Kuan-Hung
施冠宏
spellingShingle Shih, Kuan-Hung
施冠宏
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
author_sort Shih, Kuan-Hung
title Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
title_short Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
title_full Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
title_fullStr Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
title_full_unstemmed Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
title_sort real-time object detection via pruning and concatenated multi-feature assisted region proposal network
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/d7d2x3
work_keys_str_mv AT shihkuanhung realtimeobjectdetectionviapruningandconcatenatedmultifeatureassistedregionproposalnetwork
AT shīguānhóng realtimeobjectdetectionviapruningandconcatenatedmultifeatureassistedregionproposalnetwork
AT shihkuanhung jīyújiǎnzhījíduōtèzhēngliánjiēfǔzhùqūyùhòuxuǎnwǎnglùzhījíshíwùjiànzhēncè
AT shīguānhóng jīyújiǎnzhījíduōtèzhēngliánjiēfǔzhùqūyùhòuxuǎnwǎnglùzhījíshíwùjiànzhēncè
_version_ 1719252485681446912