Using Variational Multi-view Learning for Classification of Grocery Items
Summary: An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruit...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2020-11-01
|
Series: | Patterns |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666389920301914 |
id |
doaj-390317ce708748d59c00258da8a72256 |
---|---|
record_format |
Article |
spelling |
doaj-390317ce708748d59c00258da8a722562020-11-25T04:03:50ZengElsevierPatterns2666-38992020-11-0118100143Using Variational Multi-view Learning for Classification of Grocery ItemsMarcus Klasson0Cheng Zhang1Hedvig Kjellström2Division of Robotics, Perception, and Learning, Lindstedtsvägen 24, 114 28 Stockholm, Sweden; Corresponding authorMicrosoft Research Ltd, 21 Station Road, Cambridge CB1 2FB, UK; Corresponding authorDivision of Robotics, Perception, and Learning, Lindstedtsvägen 24, 114 28 Stockholm, Sweden; Corresponding authorSummary: An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients. The Bigger Picture: In recent years, several computer vision-based assistive technologies for helping visually impaired people have been released on the market. We study a special case whereby visual capability is often important when searching for objects: grocery shopping. To enable assistive vision devices for grocery shopping, data representing the grocery items have to be available. We, therefore, provide a challenging dataset of smartphone images of grocery items resembling the shopping scenario with an assistive vision device. Our dataset is publicly available to encourage other researchers to evaluate their computer vision models on grocery item classification in real-world environments. The next step would be to deploy the trained models into mobile devices, such as smartphone applications, to evaluate whether the models can perform effectively in real time via human users. This dataset is a step toward enabling these technologies to make everyday life easier for the visually impaired.http://www.sciencedirect.com/science/article/pii/S2666389920301914DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Marcus Klasson Cheng Zhang Hedvig Kjellström |
spellingShingle |
Marcus Klasson Cheng Zhang Hedvig Kjellström Using Variational Multi-view Learning for Classification of Grocery Items Patterns DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem |
author_facet |
Marcus Klasson Cheng Zhang Hedvig Kjellström |
author_sort |
Marcus Klasson |
title |
Using Variational Multi-view Learning for Classification of Grocery Items |
title_short |
Using Variational Multi-view Learning for Classification of Grocery Items |
title_full |
Using Variational Multi-view Learning for Classification of Grocery Items |
title_fullStr |
Using Variational Multi-view Learning for Classification of Grocery Items |
title_full_unstemmed |
Using Variational Multi-view Learning for Classification of Grocery Items |
title_sort |
using variational multi-view learning for classification of grocery items |
publisher |
Elsevier |
series |
Patterns |
issn |
2666-3899 |
publishDate |
2020-11-01 |
description |
Summary: An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients. The Bigger Picture: In recent years, several computer vision-based assistive technologies for helping visually impaired people have been released on the market. We study a special case whereby visual capability is often important when searching for objects: grocery shopping. To enable assistive vision devices for grocery shopping, data representing the grocery items have to be available. We, therefore, provide a challenging dataset of smartphone images of grocery items resembling the shopping scenario with an assistive vision device. Our dataset is publicly available to encourage other researchers to evaluate their computer vision models on grocery item classification in real-world environments. The next step would be to deploy the trained models into mobile devices, such as smartphone applications, to evaluate whether the models can perform effectively in real time via human users. This dataset is a step toward enabling these technologies to make everyday life easier for the visually impaired. |
topic |
DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem |
url |
http://www.sciencedirect.com/science/article/pii/S2666389920301914 |
work_keys_str_mv |
AT marcusklasson usingvariationalmultiviewlearningforclassificationofgroceryitems AT chengzhang usingvariationalmultiviewlearningforclassificationofgroceryitems AT hedvigkjellstrom usingvariationalmultiviewlearningforclassificationofgroceryitems |
_version_ |
1724439059259981824 |