BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition

碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape,...

Full description

Bibliographic Details
Main Authors:	Lin, Tzu-Ying, 林子盈
Other Authors:	Chiu, Ching-Te
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/22hu56

id	ndltd-TW-106NTHU5392071
record_format	oai_dc
spelling	ndltd-TW-106NTHU53920712019-08-21T03:41:39Z http://ndltd.ncl.edu.tw/handle/22hu56 BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition 基於具鑑別力之多模組深度學習神經網路之RGB-D手勢及人臉辨識 Lin, Tzu-Ying 林子盈碩士國立清華大學資訊工程學系所 106 In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved. Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features. In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance. We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration. Chiu, Ching-Te 邱瀞德 2018 學位論文 ; thesis 60 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved. Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features. In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance. We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration.
author2	Chiu, Ching-Te
author_facet	Chiu, Ching-Te Lin, Tzu-Ying 林子盈
author	Lin, Tzu-Ying 林子盈
spellingShingle	Lin, Tzu-Ying 林子盈 BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
author_sort	Lin, Tzu-Ying
title	BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_short	BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_full	BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_fullStr	BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_full_unstemmed	BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_sort	biodesnet: discriminative multi-modal deep learning for rgb-d gesture and face recognition
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/22hu56
work_keys_str_mv	AT lintzuying biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition AT línziyíng biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition AT lintzuying jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí AT línziyíng jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí
_version_	1719235699469713408

BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition

Similar Items