Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approa...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9345685/ |
id |
doaj-5ee8db7184d44cbd97af24b39d9038f2 |
---|---|
record_format |
Article |
spelling |
doaj-5ee8db7184d44cbd97af24b39d9038f22021-03-30T15:06:45ZengIEEEIEEE Access2169-35362021-01-019222952230310.1109/ACCESS.2021.30568789345685Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking ProcessesMarco Altmann0https://orcid.org/0000-0001-7118-209XPeter Ott1https://orcid.org/0000-0003-3513-4167Nicolaj C. Stache2https://orcid.org/0000-0002-6308-0146Christian Waldschmidt3https://orcid.org/0000-0003-2090-6136Institute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Microwave Engineering, Ulm University, Ulm, GermanyThis paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, 2 × 2 cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a 2 × 2 cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task.https://ieeexplore.ieee.org/document/9345685/Machine learningneural networksradar applicationsmultimodal sensorscross learningautoencoder |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Marco Altmann Peter Ott Nicolaj C. Stache Christian Waldschmidt |
spellingShingle |
Marco Altmann Peter Ott Nicolaj C. Stache Christian Waldschmidt Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes IEEE Access Machine learning neural networks radar applications multimodal sensors cross learning autoencoder |
author_facet |
Marco Altmann Peter Ott Nicolaj C. Stache Christian Waldschmidt |
author_sort |
Marco Altmann |
title |
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes |
title_short |
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes |
title_full |
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes |
title_fullStr |
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes |
title_full_unstemmed |
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes |
title_sort |
multi-modal cross learning for an fmcw radar assisted by thermal and rgb cameras to monitor gestures and cooking processes |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, 2 × 2 cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a 2 × 2 cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task. |
topic |
Machine learning neural networks radar applications multimodal sensors cross learning autoencoder |
url |
https://ieeexplore.ieee.org/document/9345685/ |
work_keys_str_mv |
AT marcoaltmann multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses AT peterott multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses AT nicolajcstache multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses AT christianwaldschmidt multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses |
_version_ |
1724179983806496768 |