Summary: | 博士 === 國立成功大學 === 工業與資訊管理學系 === 102 === Virtual sample generation approaches have been used with small data sets to enhance learning performance in a number of reports. The appropriate estimation of the data distribution plays an important role in this process, and the resulting performance is usually better for data sets that have a simple distribution rather than a complex one. However, mixed-type data sets often have a multi-modal distribution instead of a simple, uni-modal one. In order to solve this problem, this study assumes that a data set follows a two-parameter Weibull distribution, and proposes the Maximal P-Value method to estimate two parameters of a Weibull distribution to construct a nonlinear and asymmetrical small data distribution. Further, this study thus proposes a new approach to detect multi-modality in data sets, to avoid the problem of inappropriately using a uni-modal distribution. This work utilizes the common k-means clustering method to detect possible clusters, and, based on the clustered sample sets, a Weibull variate is estimated for each of these to produce multi-modal virtual data. In this approach, the degree of error variation in the Weibull skewness between the original and virtual data is measured and used as the criterion for determining the sizes of virtual samples. This study provides simulated data sets and two practical examples to demonstrate that the Maximal P-Value method is a more appropriate technique to increase estimation accuracy of data distribution with small sample sizes. In addition, six data sets with different training data sizes are employed to check the performance of the proposed method, and comparisons are made based on the classification accuracy. Finally, the experimental results using non-parametric testing show that the proposed method has better classification performance than that of the Mega-Trend-Diffusion method.
|