Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes

碩士 === 中原大學 === 電子工程研究所 === 97 === Abstract Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and...

Full description

Bibliographic Details
Main Authors:	Yugowati Praharsi, 游華英
Other Authors:	Shaou-Gang Miaou
Format:	Others
Language:	en_US
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/17100265072485850834

id	ndltd-TW-097CYCU5428062
record_format	oai_dc
spelling	ndltd-TW-097CYCU54280622015-10-13T12:04:54Z http://ndltd.ncl.edu.tw/handle/17100265072485850834 Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes 用於偵測糖尿病的監督式學習法和特徵選取 Yugowati Praharsi 游華英碩士中原大學電子工程研究所 97 Abstract Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and Support Vector Machine (SVM). Feature selection in supervised learning is useful to find a feature subset that produces higher classification accuracy. Both forward selection based wrapper and correlation based filter approaches are considered in this thesis. Correlation between features and class label is measured using entropy and information gain (IG) while feature-feature correlation is calculated using Pearson correlation. This study compares the performance of three classifiers (k-NN, SVDD and SVM) with and without feature selection. It is expected that the classifiers with the proposed feature selection methods will perform better than the classifiers without feature selection. In addition, the selected feature subset can be used to describe data structure no matter what classifier types or feature selection methods are used. The data sample chosen is PIMA Indians diabetes from UCI database. The results show that forward feature selection produces the best feature subset for SVM and 5-NN. In addition, feature selection based on mean information gain and a standard deviation threshold gives the best result for 1-NN classifier and such a selection method can be considered as a substitute for forward selection. It is computationally efficient and the accuracy does not decrease significantly for SVM and 5-NN, as compared to forward selection. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in IG can be used to sort from the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritization for symptom recognition. Shaou-Gang Miaou 繆紹綱 2009 學位論文 ; thesis 62 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 中原大學 === 電子工程研究所 === 97 === Abstract Data description and classification are interesting and important tasks which are applied widely in supervised learning. In this thesis, three supervised learning methods are considered: k-Nearest Neighbor (k-NN), Support Vector Data Description (SVDD) and Support Vector Machine (SVM). Feature selection in supervised learning is useful to find a feature subset that produces higher classification accuracy. Both forward selection based wrapper and correlation based filter approaches are considered in this thesis. Correlation between features and class label is measured using entropy and information gain (IG) while feature-feature correlation is calculated using Pearson correlation. This study compares the performance of three classifiers (k-NN, SVDD and SVM) with and without feature selection. It is expected that the classifiers with the proposed feature selection methods will perform better than the classifiers without feature selection. In addition, the selected feature subset can be used to describe data structure no matter what classifier types or feature selection methods are used. The data sample chosen is PIMA Indians diabetes from UCI database. The results show that forward feature selection produces the best feature subset for SVM and 5-NN. In addition, feature selection based on mean information gain and a standard deviation threshold gives the best result for 1-NN classifier and such a selection method can be considered as a substitute for forward selection. It is computationally efficient and the accuracy does not decrease significantly for SVM and 5-NN, as compared to forward selection. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in IG can be used to sort from the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritization for symptom recognition.
author2	Shaou-Gang Miaou
author_facet	Shaou-Gang Miaou Yugowati Praharsi 游華英
author	Yugowati Praharsi 游華英
spellingShingle	Yugowati Praharsi 游華英 Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
author_sort	Yugowati Praharsi
title	Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
title_short	Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
title_full	Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
title_fullStr	Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
title_full_unstemmed	Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes
title_sort	supervised learning approaches and feature selection - a case study in diabetes
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/17100265072485850834
work_keys_str_mv	AT yugowatipraharsi supervisedlearningapproachesandfeatureselectionacasestudyindiabetes AT yóuhuáyīng supervisedlearningapproachesandfeatureselectionacasestudyindiabetes AT yugowatipraharsi yòngyúzhēncètángniàobìngdejiāndūshìxuéxífǎhétèzhēngxuǎnqǔ AT yóuhuáyīng yòngyúzhēncètángniàobìngdejiāndūshìxuéxífǎhétèzhēngxuǎnqǔ
_version_	1716852185301516288

Supervised Learning Approaches and Feature Selection - A Case Study in Diabetes

Similar Items