Summary: | In this thesis, we examine machine learning as a tool for predicting new cus- tomers in a B2B-sales context. Using only publicly available information, we try to solve the problem using two different approaches: 1) a naive clustering based classifier built on K-means and 2) PU-learning with a random forests- adapter. We test these models with different sets of features and evaluate them using statistical measures and a discussion of the business implications. Our main findings conclude that the PU-learning could produce results that are satisfactorily for the purpose of improving the sales process, with the best case of being 4.8 times better than a random baseline classifier. However, the clustering based classifier was not good enough, producing only marginally better results than a random classifier in its best case. We also find that us- ing more variables improved the models, even in high-dimensional spaces with over 60 variables.
|