BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition

碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape,...

Full description

Bibliographic Details
Main Authors: Lin, Tzu-Ying, 林子盈
Other Authors: Chiu, Ching-Te
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/22hu56
id ndltd-TW-106NTHU5392071
record_format oai_dc
spelling ndltd-TW-106NTHU53920712019-08-21T03:41:39Z http://ndltd.ncl.edu.tw/handle/22hu56 BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition 基於具鑑別力之多模組深度學習神經網路之RGB-D手勢及人臉辨識 Lin, Tzu-Ying 林子盈 碩士 國立清華大學 資訊工程學系所 106 In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved. Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features. In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance. We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration. Chiu, Ching-Te 邱瀞德 2018 學位論文 ; thesis 60 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊工程學系所 === 106 === In recent years, due to the rapid development of depth sensors and their wide application scenarios, it becomes an important technology for using depth image to face and gesture recognition. Depth data provides more information about appearance and object shape, and it is invariant to lighting or color variations. For example, in AR/VR applications, depth information is more important for gesture recognition due to the complex light changes. Hence, with depth information, it can be expected that the performance of recognition as well as safety can be improved. Although many studies have been conducted, most work used hand-crafted features, the problem is that they do not readily extend to different datasets and need extra effort to extract features in depth images. Recently, with the growth of deep convolutional neural network, most of them either treat RGB-D as undifferentiated four-channel data, or learn features from color and depth separately, which cannot adequately utilize the difference between two individually features. In this paper, we propose a CNN-based multi-modal learning framework for RGB-D gesture and face recognition. After training the network of color and depth separately, we use both feature fusion and add our own defined discriminative and associate loss function to strengthen the complementary and discrimination between two modalities to improve the performance. We performed experiments on gesture and face RGB-D datasets. The results show that our multi-modal learning method achieved 97.8\% and 99.7\% classification accuracy on the ASL Finger Spelling and IIITD Face RGB-D dataset respectively. In addition, we map a face image with color and its corresponding depth into a feature vector, and convert it into a global descriptor. The equal error rate is 5.663\% with 256 bits global descriptors. To extract a global descriptor from an image, it takes only 0.83 seconds with GPU acceleration and comparing two 256 bits global descriptors for just 22.7 microseconds without GPU acceleration.
author2 Chiu, Ching-Te
author_facet Chiu, Ching-Te
Lin, Tzu-Ying
林子盈
author Lin, Tzu-Ying
林子盈
spellingShingle Lin, Tzu-Ying
林子盈
BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
author_sort Lin, Tzu-Ying
title BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_short BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_full BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_fullStr BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_full_unstemmed BiodesNet: Discriminative Multi-Modal Deep Learning for RGB-D Gesture and Face Recognition
title_sort biodesnet: discriminative multi-modal deep learning for rgb-d gesture and face recognition
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/22hu56
work_keys_str_mv AT lintzuying biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition
AT línziyíng biodesnetdiscriminativemultimodaldeeplearningforrgbdgestureandfacerecognition
AT lintzuying jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí
AT línziyíng jīyújùjiànbiélìzhīduōmózǔshēndùxuéxíshénjīngwǎnglùzhīrgbdshǒushìjírénliǎnbiànshí
_version_ 1719235699469713408