Constructing decision trees for user behavior prediction in the online consumer market

This thesis intends to investigate the usefulness of various aspects of product data for user behavior prediction in the online shopping market. Specifically, a data set from BestBuy was used, containing information regarding what product a user clicked on given their search query. Decision trees ar...

Full description

Bibliographic Details
Main Authors: Fokin, Dennis, Hagrot, Joel
Format: Others
Language:English
Published: KTH, Skolan för datavetenskap och kommunikation (CSC) 2016
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186497
id ndltd-UPSALLA1-oai-DiVA.org-kth-186497
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-1864972018-01-11T05:12:09ZConstructing decision trees for user behavior prediction in the online consumer marketengFokin, DennisHagrot, JoelKTH, Skolan för datavetenskap och kommunikation (CSC)KTH, Skolan för datavetenskap och kommunikation (CSC)2016Computer SciencesDatavetenskap (datalogi)This thesis intends to investigate the usefulness of various aspects of product data for user behavior prediction in the online shopping market. Specifically, a data set from BestBuy was used, containing information regarding what product a user clicked on given their search query. Decision trees are machine learning algorithms used for making predictions. The decision tree algorithm ID3 was used because of its simplicity and interpretability. It uses information gain to measure how different attributes help the tree split the set into smaller subsets. The approach was to use one decision tree for each product in the data set, and analyze the distribution of the attributes' maximum information gains in the root splits across the various trees. For each of these splits, all possible pivot values (a pivot value being the value split on) were attempted, and the pivot values were also recorded to analyze which pivot values that resulted in the most gain. The results show that how well the query string matches the product title and description are the two most important aspects, followed by the product's novelty. The number of days since the last two reviews were written before the query proved a decent way to identify trends. The paper also presents how the attributes were used by analyzing the pivot value distributions, with the conclusion that many attributes were used in similar ways for most products, suggesting it might be possible to create a universal tree applicable for all products. Regarding the usefulness of decision trees, it was found that they are not very efficient for highly volatile databases, such as those found in the online shopping market. The notion of a universal tree, however, suggests that future work might investigate whether their efficiency could be improved using this, more flexible, approach. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186497application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Computer Sciences
Datavetenskap (datalogi)
spellingShingle Computer Sciences
Datavetenskap (datalogi)
Fokin, Dennis
Hagrot, Joel
Constructing decision trees for user behavior prediction in the online consumer market
description This thesis intends to investigate the usefulness of various aspects of product data for user behavior prediction in the online shopping market. Specifically, a data set from BestBuy was used, containing information regarding what product a user clicked on given their search query. Decision trees are machine learning algorithms used for making predictions. The decision tree algorithm ID3 was used because of its simplicity and interpretability. It uses information gain to measure how different attributes help the tree split the set into smaller subsets. The approach was to use one decision tree for each product in the data set, and analyze the distribution of the attributes' maximum information gains in the root splits across the various trees. For each of these splits, all possible pivot values (a pivot value being the value split on) were attempted, and the pivot values were also recorded to analyze which pivot values that resulted in the most gain. The results show that how well the query string matches the product title and description are the two most important aspects, followed by the product's novelty. The number of days since the last two reviews were written before the query proved a decent way to identify trends. The paper also presents how the attributes were used by analyzing the pivot value distributions, with the conclusion that many attributes were used in similar ways for most products, suggesting it might be possible to create a universal tree applicable for all products. Regarding the usefulness of decision trees, it was found that they are not very efficient for highly volatile databases, such as those found in the online shopping market. The notion of a universal tree, however, suggests that future work might investigate whether their efficiency could be improved using this, more flexible, approach.
author Fokin, Dennis
Hagrot, Joel
author_facet Fokin, Dennis
Hagrot, Joel
author_sort Fokin, Dennis
title Constructing decision trees for user behavior prediction in the online consumer market
title_short Constructing decision trees for user behavior prediction in the online consumer market
title_full Constructing decision trees for user behavior prediction in the online consumer market
title_fullStr Constructing decision trees for user behavior prediction in the online consumer market
title_full_unstemmed Constructing decision trees for user behavior prediction in the online consumer market
title_sort constructing decision trees for user behavior prediction in the online consumer market
publisher KTH, Skolan för datavetenskap och kommunikation (CSC)
publishDate 2016
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186497
work_keys_str_mv AT fokindennis constructingdecisiontreesforuserbehaviorpredictionintheonlineconsumermarket
AT hagrotjoel constructingdecisiontreesforuserbehaviorpredictionintheonlineconsumermarket
_version_ 1718604314618888192