Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method

Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correl...

Full description

Bibliographic Details
Main Authors: Karshiev Sanjar, Olimov Bekhzod, Jaesoo Kim, Anand Paul, Jeonghong Kim
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/9/4/227
id doaj-719a4e3e4a7546ad9d800990507cb35b
record_format Article
spelling doaj-719a4e3e4a7546ad9d800990507cb35b2020-11-25T03:10:55ZengMDPI AGISPRS International Journal of Geo-Information2220-99642020-04-01922722710.3390/ijgi9040227Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF MethodKarshiev Sanjar0Olimov Bekhzod1Jaesoo Kim2Anand Paul3Jeonghong Kim4The School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaAccurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KNN–MCF). As far as we are concerned, there has been no previous research that has focused on important features dealing with missing observations. Compared to the typical machine learning prediction algorithms, the prediction accuracy of the proposed method is 92.01% with the random forest algorithm, which is more efficient than the other methods.https://www.mdpi.com/2220-9964/9/4/227house price predictionhandling missing datarandom forest
collection DOAJ
language English
format Article
sources DOAJ
author Karshiev Sanjar
Olimov Bekhzod
Jaesoo Kim
Anand Paul
Jeonghong Kim
spellingShingle Karshiev Sanjar
Olimov Bekhzod
Jaesoo Kim
Anand Paul
Jeonghong Kim
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
ISPRS International Journal of Geo-Information
house price prediction
handling missing data
random forest
author_facet Karshiev Sanjar
Olimov Bekhzod
Jaesoo Kim
Anand Paul
Jeonghong Kim
author_sort Karshiev Sanjar
title Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
title_short Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
title_full Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
title_fullStr Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
title_full_unstemmed Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
title_sort missing data imputation for geolocation-based price prediction using knn–mcf method
publisher MDPI AG
series ISPRS International Journal of Geo-Information
issn 2220-9964
publishDate 2020-04-01
description Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KNN–MCF). As far as we are concerned, there has been no previous research that has focused on important features dealing with missing observations. Compared to the typical machine learning prediction algorithms, the prediction accuracy of the proposed method is 92.01% with the random forest algorithm, which is more efficient than the other methods.
topic house price prediction
handling missing data
random forest
url https://www.mdpi.com/2220-9964/9/4/227
work_keys_str_mv AT karshievsanjar missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod
AT olimovbekhzod missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod
AT jaesookim missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod
AT anandpaul missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod
AT jeonghongkim missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod
_version_ 1724656443646279680