NOISE IMPACT REDUCTION IN CLASSIFICATION APPROACH PREDICTING SOCIAL NETWORKS CHECK-IN LOCATIONS

Since August 2010, Facebook has entered the self-reported positioning world by providing the check-in service to its users. This service allows users to share their physical location using the GPS receiver in their mobile devices such as a smart-phone, tablet, or smart-watch. Over the years, big dat...

Full description

Bibliographic Details
Main Author: Jedari Fathi, Elnaz
Format: Others
Published: OpenSIUC 2017
Subjects:
Online Access:https://opensiuc.lib.siu.edu/theses/2110
https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3124&context=theses
Description
Summary:Since August 2010, Facebook has entered the self-reported positioning world by providing the check-in service to its users. This service allows users to share their physical location using the GPS receiver in their mobile devices such as a smart-phone, tablet, or smart-watch. Over the years, big datasets of recorded check-ins have been collected with increasing popularity of social networks. Analyzing the check-in datasets reveals valuable information and patterns in users’ check-in behavior as well as places check-in history. The analysis results can be used in several areas including business planning and financial decisions, for instance providing location-based deals. In this thesis, we leverage novel data mining methodology to learn from big check-in data and predict the next check-in place based on only places’ history and with no reference to individual users. To this end, we study a large Facebook check-in dataset. This dataset has a high level of noise in location coordinates due to multiple collection sources, which are users’ mobile devices. The research question is how we can leverage a noise impact reduction technique to enhance performance of prediction model. We design our own noise handling mechanism to deal with feature noise. The predictive model is generated by Random Forest classification algorithm in a shared-memory parallel environment. We represent how the performance of predictors is enhanced by minimizing noise impacts. The solution is a preprocessing feature noise cleansing approach implemented in R and works fast for big check-in datasets.