Summary: | 碩士 === 國立成功大學 === 經營管理碩士學位學程(AMBA) === 104 === In the concept of big data analysis, data scientists usually use the whole population to make analysis instead of using stochastic sampling. This research wants to apply the concept of big data analysis to stock prediction. The purpose of this research is to predict the next-day up or down of Taiwan Stock Exchange Capitalization Weighted Index (TAIEX), using machine learning methods including linear regression, decision forest regression, and two-classification decision forest to create models. Firstly, we choose the method which performs best in overall prediction to the second step. In the second step, we set the latest 150 days data as test set, using 4 different combinations of features to create models to find the best two training volumes. In the end, we examine 9 different test period using 4 different combinations multiplies 2 training volumes. Linear regression performs better in overall prediction. The best training volume is 279 and the best feature combination consists of 5 features, which has the least amount among 4 combinations. The best average prediction accuracy is 64.59%, which is slightly lower than prediction directly using NASDAQ Index. The results tell that in predicting stock markets using machine learning methods, we can use just a few but important features to create models without large training sets.
|