Summary: | Thesis (Ph.D.)--University of Hawaii at Manoa, 2008. === DNA microarray technology has provided researchers a high-throughput means to simultaneously measure expression levels for thousands of genes in an experiment. With a probit regression setting and assuming that the link function between significant gene expression data and latent variable for the response label is a Gaussian process, a kernel-induced hierarchical Bayesian framework is built for a cancer classification problem by using microarray gene expression data. === In summary, built on a Gaussian process model, a kernel-induced hierarchical Bayesian framework using microarray gene expression data for a cancer multi-classification problem is presented in this study. Our main contribution is a fully automated learning algorithm to solve this Bayesian model. Satisfactory results have been achieved in both the simulated examples and the real-world data studies. === Six published microarray datasets were analyzed in this study. The results show that predictive performance of our method for all these datasets is better than or at least as good as that of other state-of-the-art microarray analysis methods. Our method especially shows its superiority in analyzing one dataset that contains multiple suspicious mislabeled samples. For each of these datasets, we identified a set of significant genes, which can be used for further biological inspection at genome level. === Targeting a multi-classification problem and adopting a variable selection approach with a Gibbs sample as core, we developed the algorithm, kernel-imbedded Gaussian Process (KIGP), to analyze microarray data under a Bayesian framework. Through a feature projection procedure and using a univariate ranking scheme as gene-selection strategy, we further designed an alternative microarray analysis model, natural kernel-imbedded Gaussian Process (NKIGP). In the end, embedded with a reversible jump Markov chain Monte Carlo (RJMCMC) model, we present an efficient algorithm with a cascading structure to unify the proposed methods of this study. === The simulated examples demonstrate that, our method performs almost always close to the Bayesian bound in both the cases with linear Bayesian classifiers and the cases with very non-linear Bayesian classifiers. Even with mislabeled training samples, our method is still robust, showing its broad usability to those microarray analysis problems that linear methods may work flakily. === Includes bibliographical references (leaves xxx-xxx). === Also available by subscription via World Wide Web === 179 leaves, bound 29 cm
|