BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape,...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/22hu56 |
id |
ndltd-TW-106NTHU5392071 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NTHU53920712019-08-21T03:41:39Z http://ndltd.ncl.edu.tw/handle/22hu56 BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition 基於具鑑別力之多模組深度學習神經網路之RGB-D手勢及人臉辨識 Lin, Tzu-Ying 林子盈 碩士 國立清華大學 資訊工程學系所 106 In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved. Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features. In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance. We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration. Chiu, Ching-Te 邱瀞德 2018 學位論文 ; thesis 60 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved.
Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features.
In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance.
We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration.
|
author2 |
Chiu, Ching-Te |
author_facet |
Chiu, Ching-Te Lin, Tzu-Ying 林子盈 |
author |
Lin, Tzu-Ying 林子盈 |
spellingShingle |
Lin, Tzu-Ying 林子盈 BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
author_sort |
Lin, Tzu-Ying |
title |
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
title_short |
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
title_full |
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
title_fullStr |
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
title_full_unstemmed |
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition |
title_sort |
biodesnet: discriminative multi-modal deep learning for rgb-d gesture and face recognition |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/22hu56 |
work_keys_str_mv |
AT lintzuying biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition AT línziyíng biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition AT lintzuying jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí AT línziyíng jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí |
_version_ |
1719235699469713408 |