Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods

When a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial p...

Full description

Bibliographic Details
Main Authors: Alstermark, Olivia, Stolt, Evangelina
Format: Others
Language:English
Published: Umeå universitet, Institutionen för matematik och matematisk statistik 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184831
id ndltd-UPSALLA1-oai-DiVA.org-umu-184831
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-umu-1848312021-06-25T05:37:09ZPurchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methodsengAlstermark, OliviaStolt, EvangelinaUmeå universitet, Institutionen för matematik och matematisk statistik2021Purchase Probability PredictionMachine Learning ModelsWell-Calibrated ProbabilitiesImbalanced DataData ProtectionMathematicsMatematikWhen a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial purchase. The aim of this master thesis is to investigate the possibility of using machine learning techniques to predict the likelihood of a new customer returning for a second purchase within a certain time frame. To investigate to what degree machine learning techniques can be used to predict probability of return, a number of di↵erent model setups of Logistic Lasso, Support Vector Machine and Extreme Gradient Boosting are tested. Model development is performed to ensure well-calibrated probability predictions and to possibly overcome the diculty followed from an imbalanced ratio of returning and non-returning customers. Throughout the thesis work, a number of actions are taken in order to account for data protection. One such action is to add noise to the response feature, ensuring that the true fraction of returning and non-returning customers cannot be derived. To further guarantee data protection, axes values of evaluation plots are removed and evaluation metrics are scaled. Nevertheless, it is perfectly possible to select the superior model out of all investigated models. The results obtained show that the best performing model is a Platt calibrated Extreme Gradient Boosting model, which has much higher performance than the other models with regards to considered evaluation metrics, while also providing predicted probabilities of high quality. Further, the results indicate that the setups investigated to account for imbalanced data do not improve model performance. The main con- clusion is that it is possible to obtain probability predictions of high quality for new customers returning to a company for a second purchase within a certain time frame, using machine learning techniques. This provides a powerful tool for a company when evaluating potential prospects. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184831application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Purchase Probability Prediction
Machine Learning Models
Well-Calibrated Probabilities
Imbalanced Data
Data Protection
Mathematics
Matematik
spellingShingle Purchase Probability Prediction
Machine Learning Models
Well-Calibrated Probabilities
Imbalanced Data
Data Protection
Mathematics
Matematik
Alstermark, Olivia
Stolt, Evangelina
Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
description When a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial purchase. The aim of this master thesis is to investigate the possibility of using machine learning techniques to predict the likelihood of a new customer returning for a second purchase within a certain time frame. To investigate to what degree machine learning techniques can be used to predict probability of return, a number of di↵erent model setups of Logistic Lasso, Support Vector Machine and Extreme Gradient Boosting are tested. Model development is performed to ensure well-calibrated probability predictions and to possibly overcome the diculty followed from an imbalanced ratio of returning and non-returning customers. Throughout the thesis work, a number of actions are taken in order to account for data protection. One such action is to add noise to the response feature, ensuring that the true fraction of returning and non-returning customers cannot be derived. To further guarantee data protection, axes values of evaluation plots are removed and evaluation metrics are scaled. Nevertheless, it is perfectly possible to select the superior model out of all investigated models. The results obtained show that the best performing model is a Platt calibrated Extreme Gradient Boosting model, which has much higher performance than the other models with regards to considered evaluation metrics, while also providing predicted probabilities of high quality. Further, the results indicate that the setups investigated to account for imbalanced data do not improve model performance. The main con- clusion is that it is possible to obtain probability predictions of high quality for new customers returning to a company for a second purchase within a certain time frame, using machine learning techniques. This provides a powerful tool for a company when evaluating potential prospects.
author Alstermark, Olivia
Stolt, Evangelina
author_facet Alstermark, Olivia
Stolt, Evangelina
author_sort Alstermark, Olivia
title Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
title_short Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
title_full Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
title_fullStr Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
title_full_unstemmed Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods
title_sort purchase probability prediction : predicting likelihood of a new customer returning for a second purchase using machine learning methods
publisher Umeå universitet, Institutionen för matematik och matematisk statistik
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184831
work_keys_str_mv AT alstermarkolivia purchaseprobabilitypredictionpredictinglikelihoodofanewcustomerreturningforasecondpurchaseusingmachinelearningmethods
AT stoltevangelina purchaseprobabilitypredictionpredictinglikelihoodofanewcustomerreturningforasecondpurchaseusingmachinelearningmethods
_version_ 1719412766914117632