Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
碩士 === 中原大學 === 電子工程研究所 === 97 === Abstract Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/17100265072485850834 |
id |
ndltd-TW-097CYCU5428062 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097CYCU54280622015-10-13T12:04:54Z http://ndltd.ncl.edu.tw/handle/17100265072485850834 Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes 用於偵測糖尿病的監督式學習法和特徵選取 Yugowati Praharsi 游華英 碩士 中原大學 電子工程研究所 97 Abstract Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and Support Vector Machine (SVM). Feature selection in supervised learning is useful to find a feature subset that produces higher classification accuracy. Both forward selection based wrapper and correlation based filter approaches are considered in this thesis. Correlation between features and class label is measured using entropy and information gain (IG) while feature-feature correlation is calculated using Pearson correlation. This study compares the performance of three classifiers (k-NN, SVDD and SVM) with and without feature selection. It is expected that the classifiers with the proposed feature selection methods will perform better than the classifiers without feature selection. In addition, the selected feature subset can be used to describe data structure no matter what classifier types or feature selection methods are used. The data sample chosen is PIMA Indians diabetes from UCI database. The results show that forward feature selection produces the best feature subset for SVM and 5-NN. In addition, feature selection based on mean information gain and a standard deviation threshold gives the best result for 1-NN classifier and such a selection method can be considered as a substitute for forward selection. It is computationally efficient and the accuracy does not decrease significantly for SVM and 5-NN, as compared to forward selection. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in IG can be used to sort from the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritization for symptom recognition. Shaou-Gang Miaou 繆紹綱 2009 學位論文 ; thesis 62 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中原大學 === 電子工程研究所 === 97 === Abstract
Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and Support Vector Machine (SVM).
Feature selection in supervised learning is useful to find a feature subset that produces higher classification accuracy. Both forward selection based wrapper and correlation based filter approaches are considered in this thesis. Correlation between features and class label is measured using entropy and information gain (IG) while feature-feature correlation is calculated using Pearson correlation. This study compares the performance of three classifiers (k-NN, SVDD and SVM) with and without feature selection. It is expected that the classifiers with the proposed feature selection methods will perform better than the classifiers without feature selection. In addition, the selected feature subset can be used to describe data structure no matter what classifier types or feature selection methods are used.
The data sample chosen is PIMA Indians diabetes from UCI database. The results show that forward feature selection produces the best feature subset for SVM and 5-NN. In addition, feature selection based on mean information gain and a standard deviation threshold gives the best result for 1-NN classifier and such a selection method can be considered as a substitute for forward selection. It is computationally efficient and the accuracy does not decrease significantly for SVM and 5-NN, as compared to forward selection. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in IG can be used to sort from the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritization for symptom recognition.
|
author2 |
Shaou-Gang Miaou |
author_facet |
Shaou-Gang Miaou Yugowati Praharsi 游華英 |
author |
Yugowati Praharsi 游華英 |
spellingShingle |
Yugowati Praharsi 游華英 Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
author_sort |
Yugowati Praharsi |
title |
Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
title_short |
Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
title_full |
Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
title_fullStr |
Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
title_full_unstemmed |
Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes |
title_sort |
supervised learning approaches and feature selection - a case study in diabetes |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/17100265072485850834 |
work_keys_str_mv |
AT yugowatipraharsi supervisedlearningapproachesandfeatureselectionacasestudyindiabetes AT yóuhuáyīng supervisedlearningapproachesandfeatureselectionacasestudyindiabetes AT yugowatipraharsi yòngyúzhēncètángniàobìngdejiāndūshìxuéxífǎhétèzhēngxuǎnqǔ AT yóuhuáyīng yòngyúzhēncètángniàobìngdejiāndūshìxuéxífǎhétèzhēngxuǎnqǔ |
_version_ |
1716852185301516288 |