Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest
abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have...
Other Authors: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/2286/R.I.45017 |
id |
ndltd-asu.edu-item-45017 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-asu.edu-item-450172018-06-22T03:08:40Z Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval. Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships. Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets. Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets. Dissertation/Thesis Guan, Xin (Author) Liu, Li (Advisor) Runger, George (Advisor) Dinu, Valentin (Committee member) Arizona State University (Publisher) Biostatistics feature selection prediction interval predictive modeling random forest eng 119 pages Doctoral Dissertation Biomedical Informatics 2017 Doctoral Dissertation http://hdl.handle.net/2286/R.I.45017 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2017 |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Biostatistics feature selection prediction interval predictive modeling random forest |
spellingShingle |
Biostatistics feature selection prediction interval predictive modeling random forest Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
description |
abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.
Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.
Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.
Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets. === Dissertation/Thesis === Doctoral Dissertation Biomedical Informatics 2017 |
author2 |
Guan, Xin (Author) |
author_facet |
Guan, Xin (Author) |
title |
Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
title_short |
Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
title_full |
Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
title_fullStr |
Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
title_full_unstemmed |
Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest |
title_sort |
novel methods of biomarker discovery and predictive modeling using random forest |
publishDate |
2017 |
url |
http://hdl.handle.net/2286/R.I.45017 |
_version_ |
1718701534926077952 |