The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes
碩士 === 國立屏東商業技術學院 === 資訊管理系 === 97 === The prediction of ordinal scale data is a long-standing, difficult, and unsolved problem in the machine learning/data mining research. Support Vector Machine (SVM) have demonstrated itself a robust and well-performed algorithm. It has successfully been applie...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/83288271911324824877 |
id |
ndltd-TW-097NPC05396013 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立屏東商業技術學院 === 資訊管理系 === 97 === The prediction of ordinal scale data is a long-standing, difficult, and unsolved problem in the machine learning/data mining research. Support Vector Machine (SVM) have demonstrated itself a robust and well-performed algorithm. It has successfully been applied to the predictions of multiple-class and regression problems. Therefore, several researchers try to apply it to the prediction of ordinal-class problem. Among them, some researchers proposed to deem ordinal values as continuous metric, and tried to solve the problems from the view of ordinal regression; other researchers proposed to deem ordinal values as discrete categories, and tried to solve the problems from the view of ordinal classification. We believe both of the two views have their pros and cons, but tend to adopt an adaptive view, where the ordinal classification should be adopted when the number of class values is small, and the ordinal regression should be adopted when the number of class values is large. But because of time limitation, this research only focuses on the ordinal classification problem.
Among those ordinal classification researches, Cardoso and da Costa (2007) proposed a method, called ordinal support vector machine (oSVM), in which the data are replicated into higher dimensions by inserting dummy variables, and the data's outputs are redefined to +1 or -1 for binary classification according to their level at the classes' ladder. In this way, the support vector machine can simultaneously learn the margins of two neighboring classes. They demonstrated by experiments that the prediction accuracies of their method outperform other methods. Owing that their concept is quite straight-forward and convincing, this research is based on their method and propose some improvements.
By empirically testing the performance of oSVM on several datasets, we found that when the input attributes and the output attribute have only partly or no functional relationship (such as real datasets), the performance of oSVM is usually not quite well. The classification errors are usually wrongly classified class x into class y, where y is larger or smaller than x by 2 or above rank. But when the inputs and the output have obvious functional relationship (such as the synthetic dataset), oSVM usually performs better. The classification errors are usually due to near-class misclassification as classifying class x to class x-1 or class x+1 (upper or lower one rank). We also believe that the causes of first error are that there is too much noise in the dataset such that oSVM cannot learn a good hyperplane. Past researches have pointed out that SVM is easily deteriorated by noisy data, and the fuzzy set theory is good for resolving the problem of noise. Furthermore, we believe the second error is caused by the vagueness between boundaries, and therefore can be relieved by defining the degree of class membership for each instance.
To resolve the above mentioned two problems, we designed three different fuzzy membership functions. First, to conquer noise, we define the centroid point, whose attributes are obtained by calculating either the mean or the median of each attribute of all instances in this class) to represent the core concept of each class. The closer of a sample to the centroid point (defined by the Euclidean distance) is, the more representative the sample is, and thus the membership degree is more close to 1. The more distance of sample to the centroid point, the more likely it is a noise, and thus, the membership degree is more close to 0. The second membership function further considers the distance of a sample to the discriminant hyperplane of the SVM. We define the membership degree as the distance of the sample to the hyperplane divided by the distance of the centroid to the hyperplane. Furthermore, the membership degree is set to 1 if its value is larger than 1, and set to 0 if its value is smaller than 0. By doing so, we are effectively using the centroid as the splitting point. Any sample whose distance to the hyperplane is farther than the distance of the centroid to the hyperplane will have a membership degree 1. Any sample which resides in the different side from the centroid will have a membership degree 0 (this kind of samples is deemed as noise, so its importance is set to 0). The reason behind this process is that we believe only points near to the hyperplane are likely to be noises. Second, to resolve the vagueness between class boundaries (the second error), we define our third membership function to decide the membership degree of an instance to each class. The specific procedure is by defining fuzzy width to change the span of membership function: the centroid point is set as the center of the trapezoidal membership function, and from which we further add proper fuzzy width (between 0.5 and 1.5 standard deviation of the distances of all the points to the hyperplane) to the left and to the right as the upper side the trapezoid, and from which we further expand to the centroids of nearest classes (one for each side) as the lower side the trapezoid. The membership degree is decided by the position of the instance on the trapezoidal function.
The experiment results show that our proposed method, Fuzzy Ordinal Support Vector Machine, can effectively reduce the error rate and the mean absolute error of oSVM, and perform better than traditional methods of OrdinalClassifier or Support Vector Regression. Among the three membership functions, membership function three, with a parameter to control the fuzzy width to further fine-tune the classifier, perform best. And membership function two is better than membership function one in most cases. However, we also find that the effects the three membership functions on reducing the second error are not salient. Therefore, the future study can try to build a boosting-like model to improve the prediction accuracy.
|
author2 |
Wen-Feng Hsiao |
author_facet |
Wen-Feng Hsiao Yu-ling Yen 顏伃伶 |
author |
Yu-ling Yen 顏伃伶 |
spellingShingle |
Yu-ling Yen 顏伃伶 The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
author_sort |
Yu-ling Yen |
title |
The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
title_short |
The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
title_full |
The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
title_fullStr |
The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
title_full_unstemmed |
The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes |
title_sort |
impacts of dimension extension, postprocess, and fuzzy sets to support vector machines on predicting ordinal classes |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/83288271911324824877 |
work_keys_str_mv |
AT yulingyen theimpactsofdimensionextensionpostprocessandfuzzysetstosupportvectormachinesonpredictingordinalclasses AT yányúlíng theimpactsofdimensionextensionpostprocessandfuzzysetstosupportvectormachinesonpredictingordinalclasses AT yulingyen zīliàokuòchōnghòuchùlǐyùnsuànjímóhújíhéduìzhīchíxiàngliàngjīyùcècìxùlèibiézīliàozhīyǐngxiǎng AT yányúlíng zīliàokuòchōnghòuchùlǐyùnsuànjímóhújíhéduìzhīchíxiàngliàngjīyùcècìxùlèibiézīliàozhīyǐngxiǎng AT yulingyen impactsofdimensionextensionpostprocessandfuzzysetstosupportvectormachinesonpredictingordinalclasses AT yányúlíng impactsofdimensionextensionpostprocessandfuzzysetstosupportvectormachinesonpredictingordinalclasses |
_version_ |
1718259158475603968 |
spelling |
ndltd-TW-097NPC053960132016-05-04T04:31:30Z http://ndltd.ncl.edu.tw/handle/83288271911324824877 The Impacts of Dimension Extension, Postprocess, and Fuzzy Sets to Support Vector Machines on Predicting Ordinal Classes 資料擴充、後處理運算、及模糊集合對支持向量機預測次序類別資料之影響 Yu-ling Yen 顏伃伶 碩士 國立屏東商業技術學院 資訊管理系 97 The prediction of ordinal scale data is a long-standing, difficult, and unsolved problem in the machine learning/data mining research. Support Vector Machine (SVM) have demonstrated itself a robust and well-performed algorithm. It has successfully been applied to the predictions of multiple-class and regression problems. Therefore, several researchers try to apply it to the prediction of ordinal-class problem. Among them, some researchers proposed to deem ordinal values as continuous metric, and tried to solve the problems from the view of ordinal regression; other researchers proposed to deem ordinal values as discrete categories, and tried to solve the problems from the view of ordinal classification. We believe both of the two views have their pros and cons, but tend to adopt an adaptive view, where the ordinal classification should be adopted when the number of class values is small, and the ordinal regression should be adopted when the number of class values is large. But because of time limitation, this research only focuses on the ordinal classification problem. Among those ordinal classification researches, Cardoso and da Costa (2007) proposed a method, called ordinal support vector machine (oSVM), in which the data are replicated into higher dimensions by inserting dummy variables, and the data's outputs are redefined to +1 or -1 for binary classification according to their level at the classes' ladder. In this way, the support vector machine can simultaneously learn the margins of two neighboring classes. They demonstrated by experiments that the prediction accuracies of their method outperform other methods. Owing that their concept is quite straight-forward and convincing, this research is based on their method and propose some improvements. By empirically testing the performance of oSVM on several datasets, we found that when the input attributes and the output attribute have only partly or no functional relationship (such as real datasets), the performance of oSVM is usually not quite well. The classification errors are usually wrongly classified class x into class y, where y is larger or smaller than x by 2 or above rank. But when the inputs and the output have obvious functional relationship (such as the synthetic dataset), oSVM usually performs better. The classification errors are usually due to near-class misclassification as classifying class x to class x-1 or class x+1 (upper or lower one rank). We also believe that the causes of first error are that there is too much noise in the dataset such that oSVM cannot learn a good hyperplane. Past researches have pointed out that SVM is easily deteriorated by noisy data, and the fuzzy set theory is good for resolving the problem of noise. Furthermore, we believe the second error is caused by the vagueness between boundaries, and therefore can be relieved by defining the degree of class membership for each instance. To resolve the above mentioned two problems, we designed three different fuzzy membership functions. First, to conquer noise, we define the centroid point, whose attributes are obtained by calculating either the mean or the median of each attribute of all instances in this class) to represent the core concept of each class. The closer of a sample to the centroid point (defined by the Euclidean distance) is, the more representative the sample is, and thus the membership degree is more close to 1. The more distance of sample to the centroid point, the more likely it is a noise, and thus, the membership degree is more close to 0. The second membership function further considers the distance of a sample to the discriminant hyperplane of the SVM. We define the membership degree as the distance of the sample to the hyperplane divided by the distance of the centroid to the hyperplane. Furthermore, the membership degree is set to 1 if its value is larger than 1, and set to 0 if its value is smaller than 0. By doing so, we are effectively using the centroid as the splitting point. Any sample whose distance to the hyperplane is farther than the distance of the centroid to the hyperplane will have a membership degree 1. Any sample which resides in the different side from the centroid will have a membership degree 0 (this kind of samples is deemed as noise, so its importance is set to 0). The reason behind this process is that we believe only points near to the hyperplane are likely to be noises. Second, to resolve the vagueness between class boundaries (the second error), we define our third membership function to decide the membership degree of an instance to each class. The specific procedure is by defining fuzzy width to change the span of membership function: the centroid point is set as the center of the trapezoidal membership function, and from which we further add proper fuzzy width (between 0.5 and 1.5 standard deviation of the distances of all the points to the hyperplane) to the left and to the right as the upper side the trapezoid, and from which we further expand to the centroids of nearest classes (one for each side) as the lower side the trapezoid. The membership degree is decided by the position of the instance on the trapezoidal function. The experiment results show that our proposed method, Fuzzy Ordinal Support Vector Machine, can effectively reduce the error rate and the mean absolute error of oSVM, and perform better than traditional methods of OrdinalClassifier or Support Vector Regression. Among the three membership functions, membership function three, with a parameter to control the fuzzy width to further fine-tune the classifier, perform best. And membership function two is better than membership function one in most cases. However, we also find that the effects the three membership functions on reducing the second error are not salient. Therefore, the future study can try to build a boosting-like model to improve the prediction accuracy. Wen-Feng Hsiao 蕭文峰 2009 學位論文 ; thesis 61 zh-TW |