Summary: | 博士 === 輔仁大學 === 商學研究所博士班 === 107 === While many studies have applied data mining techniques to evaluate housing prices, few have described the important factors and prioritized them simultaneously, or even showed the patterns of factors.
The first part of this thesis aims to utilize five data mining techniques to discover the important factors for three major types of real estate, i.e. apartments, building, and suites, in Taipei City. The datasets, involving a total of 33,027 transactions, including 20 structural factors, were publicly available from the Taiwan Actual Price Registration from July 2013 to the end of 2016. The five models are Decision Tree (DT), Random Forest (RF), Model Tree (MT), Artificial Neural Networks (ANN) and Multiple Regression (MR). The criteria used to measure the accuracy are Mean Absolute Percentage Error (MAPE), Adjusted Coefficient of Determination (adj_R²), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Correlation (COR). The model with the best performance for suites is ANN with an adj_R² value of 0.84. As for apartments and buildings, the best is RF. Different housing types need different models. Furthermore, the factors’ importance helps us to conclude the really critical factors, which include the floor area, administrative districts, land area, and housing age, and their rankings.
The second part incorporates 30 environmental factors additionally, including transport-, academic-, shopping-, and living-related categories, by utilizing Google function in order to select the important factors and identify influencing patterns for three major types of real estate. After incorporating, the model with the best performance for suites is increased to 0.88 in term of adj_R². As for apartments and buildings, the results from RF are improved. By aggregating the importance of environmental factors’ categories, this thesis finds out the housing prices vary from different categories of environmental factors. The patterns of housing factors derived from Generalized Additive Models (GAM) that illustrates the pictures of those factors in a nonlinear manner.
By using data mining techniques, this thesis figures out the important factors and their ranking effectively. The factors selection and ranking procedure proposed by this thesis can also be adapted to improve the prediction efficiency for most big data applications other than the housing transactions. Moreover, this thesis depicts the relationships, i.e. linear and nonlinear patterns, of those important factors with housing prices. Those good results are contributed by data mining techniques.
|