Comparative study of Kernel based classification and feature selection methods with gene expression data

Gene expression profiles obtained by high-throughput techniques such as microarray provide a snapshot of expression values of up to ten thousands genes in a particular tissue sample. Analyzing such gene expression data can be quite cumbersome as the sample size is small, the dimensionality is high,...

Full description

Bibliographic Details
Main Author: Tan, Mingyue
Language:English
Published: 2010
Online Access:http://hdl.handle.net/2429/18337
id ndltd-UBC-oai-circle.library.ubc.ca-2429-18337
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-183372018-01-05T17:39:17Z Comparative study of Kernel based classification and feature selection methods with gene expression data Tan, Mingyue Gene expression profiles obtained by high-throughput techniques such as microarray provide a snapshot of expression values of up to ten thousands genes in a particular tissue sample. Analyzing such gene expression data can be quite cumbersome as the sample size is small, the dimensionality is high, and the data are occasionally noisy. Kernel methods such as Support Vector Machines (SVMs) [5, 45] have been extensively applied within the field of gene expression analysis, and particularly to the problems of gene classification and selection. In general, kernel methods outperform other approaches due to their ability to handle high dimensionality easily. In this thesis, we perform a comparative study of various state-of-the-art kernel based classification and feature selection methods with gene expression data. It is our aim to have all the results together in one place so that people can easily see their similarities and differences both theoretically and empirically. In the literature, a feature selection method is evaluated by the classification accuracies using the features selected by the method. This evaluation criterion measures the classification capabilities of the data after the elimination of irrelevant features. We propose another criterion, called stability, to evaluate the feature selection methods in addition to classification accuracies. The feature set selected by a stable feature selection algorithm should not change significantly when some small changes are made to the training data. In this thesis, we use both of two evaluation criteria to compare feature selection methods. It has been showed that cross validation technique can be used to improve feature selection methods in terms of classification accuracies [8]. In this thesis, we extend one existing feature selection method which utilizes Gaussian Processes (GP) [47] with Automatic Relevance Determination (ARD) [28, 34], and cross validation, and propose a new feature selection method. Experiments on real gene expression data sets show that our method outperforms all other feature selection methods in terms of classification accuracies, and achieves comparable stability as Sparse Multinomial Logistic Regression (SMLR) [23], the most stable feature selection method in the literature. Science, Faculty of Computer Science, Department of Graduate 2010-01-16T18:51:22Z 2010-01-16T18:51:22Z 2006 2006-05 Text Thesis/Dissertation http://hdl.handle.net/2429/18337 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
collection NDLTD
language English
sources NDLTD
description Gene expression profiles obtained by high-throughput techniques such as microarray provide a snapshot of expression values of up to ten thousands genes in a particular tissue sample. Analyzing such gene expression data can be quite cumbersome as the sample size is small, the dimensionality is high, and the data are occasionally noisy. Kernel methods such as Support Vector Machines (SVMs) [5, 45] have been extensively applied within the field of gene expression analysis, and particularly to the problems of gene classification and selection. In general, kernel methods outperform other approaches due to their ability to handle high dimensionality easily. In this thesis, we perform a comparative study of various state-of-the-art kernel based classification and feature selection methods with gene expression data. It is our aim to have all the results together in one place so that people can easily see their similarities and differences both theoretically and empirically. In the literature, a feature selection method is evaluated by the classification accuracies using the features selected by the method. This evaluation criterion measures the classification capabilities of the data after the elimination of irrelevant features. We propose another criterion, called stability, to evaluate the feature selection methods in addition to classification accuracies. The feature set selected by a stable feature selection algorithm should not change significantly when some small changes are made to the training data. In this thesis, we use both of two evaluation criteria to compare feature selection methods. It has been showed that cross validation technique can be used to improve feature selection methods in terms of classification accuracies [8]. In this thesis, we extend one existing feature selection method which utilizes Gaussian Processes (GP) [47] with Automatic Relevance Determination (ARD) [28, 34], and cross validation, and propose a new feature selection method. Experiments on real gene expression data sets show that our method outperforms all other feature selection methods in terms of classification accuracies, and achieves comparable stability as Sparse Multinomial Logistic Regression (SMLR) [23], the most stable feature selection method in the literature. === Science, Faculty of === Computer Science, Department of === Graduate
author Tan, Mingyue
spellingShingle Tan, Mingyue
Comparative study of Kernel based classification and feature selection methods with gene expression data
author_facet Tan, Mingyue
author_sort Tan, Mingyue
title Comparative study of Kernel based classification and feature selection methods with gene expression data
title_short Comparative study of Kernel based classification and feature selection methods with gene expression data
title_full Comparative study of Kernel based classification and feature selection methods with gene expression data
title_fullStr Comparative study of Kernel based classification and feature selection methods with gene expression data
title_full_unstemmed Comparative study of Kernel based classification and feature selection methods with gene expression data
title_sort comparative study of kernel based classification and feature selection methods with gene expression data
publishDate 2010
url http://hdl.handle.net/2429/18337
work_keys_str_mv AT tanmingyue comparativestudyofkernelbasedclassificationandfeatureselectionmethodswithgeneexpressiondata
_version_ 1718590803100565504