Summary: | 碩士 === 國立臺中教育大學 === 教育資訊與測驗統計研究所碩士在職專班 === 105 === According to numbers of research literature, data characteristics affect the data forecast results, and even affect the system performance. With the rapid development of science and technology, a number of fields can be collected in the number of tens of thousands of features, and can be used as a training set of the number of samples is far less than the number of features, the characteristics of the selected advantagesinclude: easy to understand, reduce the calculation and data storage pressure ,Reduce the dimension to improve the prediction accuracy of the model. Through the appropriate feature selection method, you can reduce the sample training time and improve the accuracy of prediction.
When the data is mapped into the high-dimensional space to do the grouping, adding the data index or characteristic can reduce the calculation time and find the most characteristic feature data set, and the minimum and the minimum value of the intra-group prediction error Feature set, which is an important goal of this study.Therefore, this study develops a nucleation feature selection method which is suitable for supporting vector regression, which is an important research topic forfeature selection.
In this study, we use the UCI Machine Learning Repository database and the semantic space lexical data of primary and secondary schools, using the featureselection method, forward feature selection method, backward feature selection method and kernel function feature selection method Set validation.
The results of this study show that the support vector regression combined with the study of the proposed nuclear feature selection method has a smaller error rate.
Support vector regression is superior to linear regression and nonlinear regression in data prediction. It is not only effective to reduce the dimension of data, but also to obtain smaller prediction error. There are a wide variety of features in the general data set. Through the feature selection method, the repetitive features or non-influential noise characteristics can be eliminated, so that the feature vector and the feature vector dimension expressed by the data can be effectively removed and the less influential Eigenvalues, to improve the forecast performance, if widely used in various fields to effectively reduce the results of dimension prediction data, will greatly enhance the performance.
|