Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method
Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correl...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-04-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/9/4/227 |
id |
doaj-719a4e3e4a7546ad9d800990507cb35b |
---|---|
record_format |
Article |
spelling |
doaj-719a4e3e4a7546ad9d800990507cb35b2020-11-25T03:10:55ZengMDPI AGISPRS International Journal of Geo-Information2220-99642020-04-01922722710.3390/ijgi9040227Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF MethodKarshiev Sanjar0Olimov Bekhzod1Jaesoo Kim2Anand Paul3Jeonghong Kim4The School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaThe School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, KoreaAccurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KNN–MCF). As far as we are concerned, there has been no previous research that has focused on important features dealing with missing observations. Compared to the typical machine learning prediction algorithms, the prediction accuracy of the proposed method is 92.01% with the random forest algorithm, which is more efficient than the other methods.https://www.mdpi.com/2220-9964/9/4/227house price predictionhandling missing datarandom forest |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Karshiev Sanjar Olimov Bekhzod Jaesoo Kim Anand Paul Jeonghong Kim |
spellingShingle |
Karshiev Sanjar Olimov Bekhzod Jaesoo Kim Anand Paul Jeonghong Kim Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method ISPRS International Journal of Geo-Information house price prediction handling missing data random forest |
author_facet |
Karshiev Sanjar Olimov Bekhzod Jaesoo Kim Anand Paul Jeonghong Kim |
author_sort |
Karshiev Sanjar |
title |
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method |
title_short |
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method |
title_full |
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method |
title_fullStr |
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method |
title_full_unstemmed |
Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method |
title_sort |
missing data imputation for geolocation-based price prediction using knn–mcf method |
publisher |
MDPI AG |
series |
ISPRS International Journal of Geo-Information |
issn |
2220-9964 |
publishDate |
2020-04-01 |
description |
Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KNN–MCF). As far as we are concerned, there has been no previous research that has focused on important features dealing with missing observations. Compared to the typical machine learning prediction algorithms, the prediction accuracy of the proposed method is 92.01% with the random forest algorithm, which is more efficient than the other methods. |
topic |
house price prediction handling missing data random forest |
url |
https://www.mdpi.com/2220-9964/9/4/227 |
work_keys_str_mv |
AT karshievsanjar missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod AT olimovbekhzod missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod AT jaesookim missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod AT anandpaul missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod AT jeonghongkim missingdataimputationforgeolocationbasedpricepredictionusingknnmcfmethod |
_version_ |
1724656443646279680 |