Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes

This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approa...

Full description

Bibliographic Details
Main Authors: Marco Altmann, Peter Ott, Nicolaj C. Stache, Christian Waldschmidt
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9345685/
id doaj-5ee8db7184d44cbd97af24b39d9038f2
record_format Article
spelling doaj-5ee8db7184d44cbd97af24b39d9038f22021-03-30T15:06:45ZengIEEEIEEE Access2169-35362021-01-019222952230310.1109/ACCESS.2021.30568789345685Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking ProcessesMarco Altmann0https://orcid.org/0000-0001-7118-209XPeter Ott1https://orcid.org/0000-0003-3513-4167Nicolaj C. Stache2https://orcid.org/0000-0002-6308-0146Christian Waldschmidt3https://orcid.org/0000-0003-2090-6136Institute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, GermanyInstitute of Microwave Engineering, Ulm University, Ulm, GermanyThis paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, 2 × 2 cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a 2 × 2 cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task.https://ieeexplore.ieee.org/document/9345685/Machine learningneural networksradar applicationsmultimodal sensorscross learningautoencoder
collection DOAJ
language English
format Article
sources DOAJ
author Marco Altmann
Peter Ott
Nicolaj C. Stache
Christian Waldschmidt
spellingShingle Marco Altmann
Peter Ott
Nicolaj C. Stache
Christian Waldschmidt
Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
IEEE Access
Machine learning
neural networks
radar applications
multimodal sensors
cross learning
autoencoder
author_facet Marco Altmann
Peter Ott
Nicolaj C. Stache
Christian Waldschmidt
author_sort Marco Altmann
title Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
title_short Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
title_full Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
title_fullStr Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
title_full_unstemmed Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
title_sort multi-modal cross learning for an fmcw radar assisted by thermal and rgb cameras to monitor gestures and cooking processes
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, 2 × 2 cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a 2 × 2 cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task.
topic Machine learning
neural networks
radar applications
multimodal sensors
cross learning
autoencoder
url https://ieeexplore.ieee.org/document/9345685/
work_keys_str_mv AT marcoaltmann multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses
AT peterott multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses
AT nicolajcstache multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses
AT christianwaldschmidt multimodalcrosslearningforanfmcwradarassistedbythermalandrgbcamerastomonitorgesturesandcookingprocesses
_version_ 1724179983806496768