Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network
碩士 === 國立清華大學 === 資訊工程學系所 === 107 === Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep lea...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/d7d2x3 |
id |
ndltd-TW-107NTHU5392011 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NTHU53920112019-09-19T03:30:12Z http://ndltd.ncl.edu.tw/handle/d7d2x3 Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network 基於剪枝及多特徵連接輔助區域候選網路之即時物件偵測 Shih, Kuan-Hung 施冠宏 碩士 國立清華大學 資訊工程學系所 107 Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two- stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy. First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from dif- ferent convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accu- rately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset. The results show that we can compress ZF-Net from 227 MB to 45 MB and save 66% of computation. We can also compress VGG16 from 523 MB to 144 MB and save 77% of computation. Consequently, the inference speed is 40 FPS for ZF-Net and 27 FPS for VGG16. With the significant compression rates, the accuracies are 60.2% mean average precision (mAP) and 69.1% mAP for ZF-Net and VGG16, respectively. Chiu, Ching-Te 邱瀞德 2018 學位論文 ; thesis 55 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊工程學系所 === 107 === Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two- stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy.
First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from dif- ferent convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accu- rately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset.
The results show that we can compress ZF-Net from 227 MB to 45 MB and save 66% of computation. We can also compress VGG16 from 523 MB to 144 MB and save 77% of computation. Consequently, the inference speed is 40 FPS for ZF-Net and 27 FPS for VGG16. With the significant compression rates, the accuracies are 60.2% mean average precision (mAP) and 69.1% mAP for ZF-Net and VGG16, respectively.
|
author2 |
Chiu, Ching-Te |
author_facet |
Chiu, Ching-Te Shih, Kuan-Hung 施冠宏 |
author |
Shih, Kuan-Hung 施冠宏 |
spellingShingle |
Shih, Kuan-Hung 施冠宏 Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
author_sort |
Shih, Kuan-Hung |
title |
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
title_short |
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
title_full |
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
title_fullStr |
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
title_full_unstemmed |
Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
title_sort |
real-time object detection via pruning and concatenated multi-feature assisted region proposal network |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/d7d2x3 |
work_keys_str_mv |
AT shihkuanhung realtimeobjectdetectionviapruningandconcatenatedmultifeatureassistedregionproposalnetwork AT shīguānhóng realtimeobjectdetectionviapruningandconcatenatedmultifeatureassistedregionproposalnetwork AT shihkuanhung jīyújiǎnzhījíduōtèzhēngliánjiēfǔzhùqūyùhòuxuǎnwǎnglùzhījíshíwùjiànzhēncè AT shīguānhóng jīyújiǎnzhījíduōtèzhēngliánjiēfǔzhùqūyùhòuxuǎnwǎnglùzhījíshíwùjiànzhēncè |
_version_ |
1719252485681446912 |