Product Matching Using Image Similarity

PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product s...

Full description

Bibliographic Details
Main Authors: Forssell, Melker, Janér, Gustav
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2020
Subjects:
AI
ML
SNN
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481
id ndltd-UPSALLA1-oai-DiVA.org-uu-413481
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-4134812020-06-24T03:32:33ZProduct Matching Using Image SimilarityengForssell, MelkerJanér, GustavUppsala universitet, Institutionen för informationsteknologi2020AIMLTriplet LossSNNMachine LearningArtificiall IntelligenceSimilarity LearningMetric LearningComputer VisionProduct MatchingImage MatchingComputer and Information SciencesData- och informationsvetenskapPriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481UPTEC IT, 1401-5749 ; 20016application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic AI
ML
Triplet Loss
SNN
Machine Learning
Artificiall Intelligence
Similarity Learning
Metric Learning
Computer Vision
Product Matching
Image Matching
Computer and Information Sciences
Data- och informationsvetenskap
spellingShingle AI
ML
Triplet Loss
SNN
Machine Learning
Artificiall Intelligence
Similarity Learning
Metric Learning
Computer Vision
Product Matching
Image Matching
Computer and Information Sciences
Data- och informationsvetenskap
Forssell, Melker
Janér, Gustav
Product Matching Using Image Similarity
description PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner.
author Forssell, Melker
Janér, Gustav
author_facet Forssell, Melker
Janér, Gustav
author_sort Forssell, Melker
title Product Matching Using Image Similarity
title_short Product Matching Using Image Similarity
title_full Product Matching Using Image Similarity
title_fullStr Product Matching Using Image Similarity
title_full_unstemmed Product Matching Using Image Similarity
title_sort product matching using image similarity
publisher Uppsala universitet, Institutionen för informationsteknologi
publishDate 2020
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481
work_keys_str_mv AT forssellmelker productmatchingusingimagesimilarity
AT janergustav productmatchingusingimagesimilarity
_version_ 1719323690267574272