Product Matching Using Image Similarity
PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product s...
Main Authors: | , |
---|---|
Format: | Others |
Language: | English |
Published: |
Uppsala universitet, Institutionen för informationsteknologi
2020
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481 |
id |
ndltd-UPSALLA1-oai-DiVA.org-uu-413481 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-uu-4134812020-06-24T03:32:33ZProduct Matching Using Image SimilarityengForssell, MelkerJanér, GustavUppsala universitet, Institutionen för informationsteknologi2020AIMLTriplet LossSNNMachine LearningArtificiall IntelligenceSimilarity LearningMetric LearningComputer VisionProduct MatchingImage MatchingComputer and Information SciencesData- och informationsvetenskapPriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481UPTEC IT, 1401-5749 ; 20016application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
AI ML Triplet Loss SNN Machine Learning Artificiall Intelligence Similarity Learning Metric Learning Computer Vision Product Matching Image Matching Computer and Information Sciences Data- och informationsvetenskap |
spellingShingle |
AI ML Triplet Loss SNN Machine Learning Artificiall Intelligence Similarity Learning Metric Learning Computer Vision Product Matching Image Matching Computer and Information Sciences Data- och informationsvetenskap Forssell, Melker Janér, Gustav Product Matching Using Image Similarity |
description |
PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner. |
author |
Forssell, Melker Janér, Gustav |
author_facet |
Forssell, Melker Janér, Gustav |
author_sort |
Forssell, Melker |
title |
Product Matching Using Image Similarity |
title_short |
Product Matching Using Image Similarity |
title_full |
Product Matching Using Image Similarity |
title_fullStr |
Product Matching Using Image Similarity |
title_full_unstemmed |
Product Matching Using Image Similarity |
title_sort |
product matching using image similarity |
publisher |
Uppsala universitet, Institutionen för informationsteknologi |
publishDate |
2020 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481 |
work_keys_str_mv |
AT forssellmelker productmatchingusingimagesimilarity AT janergustav productmatchingusingimagesimilarity |
_version_ |
1719323690267574272 |