Deep feature extraction technique based on Conv1D and LSTM network for food image recognition

There is a global increase in health awareness. The awareness of changing eating habits and choosing foods wisely are key factors that make for a healthy life. In order to design a food image recognition system, many food images were captured from a mobile device but sometimes include non-food objec...

Full description

Bibliographic Details
Main Authors:	Sirawan Phiphitphatphaisit, Olarik Surinta
Format:	Article
Language:	English
Published:	Khon Kaen University 2021-07-01
Series:	Engineering and Applied Science Research
Subjects:	food image recognition deep feature extraction method long short-term memory convolutional neural network spatial temporal features
Online Access:	https://ph01.tci-thaijo.org/index.php/easr/article/download/243559/166486/

Description
Summary:	There is a global increase in health awareness. The awareness of changing eating habits and choosing foods wisely are key factors that make for a healthy life. In order to design a food image recognition system, many food images were captured from a mobile device but sometimes include non-food objects such as people, cutlery, and even food decoration styles, called noise food images. These issues decreased the performance of the system. Convolutional neural network (CNN) architectures are proposed to address this issue and obtain good performance. In this study, we proposed to use the ResNet50-LSTM network to improve the efficiency of the food image recognition system. The state-of-the-art ResNet architecture was invented to extract the robust features from food images and was employed as the input data for the Conv1D combined with a long short-term memory (LSTM) network called Conv1D-LSTM. Then, the output of the LSTM was assigned to the global average pooling layer before passing to the softmax function to create a probability distribution. While training the CNN model, mixed data augmentation techniques were applied and increased by 0.6%. The results showed that the ResNet50+Conv1D-LSTM network outperformed the previous works on the Food-101 dataset. The best performance of the ResNet50+Conv1D-LSTM network achieved an accuracy of 90.87%.
ISSN:	2539-6161 2539-6218

Deep feature extraction technique based on Conv1D and LSTM network for food image recognition

Similar Items