A two-stage random forest-based pathway analysis method.

Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from the...

Full description

Bibliographic Details
Main Authors: Ren-Hua Chung, Ying-Erh Chen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3346727?pdf=render
id doaj-e6176afdef684ed5a242a8234cefd097
record_format Article
spelling doaj-e6176afdef684ed5a242a8234cefd0972020-11-25T01:47:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0175e3666210.1371/journal.pone.0036662A two-stage random forest-based pathway analysis method.Ren-Hua ChungYing-Erh ChenPathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers.http://europepmc.org/articles/PMC3346727?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Ren-Hua Chung
Ying-Erh Chen
spellingShingle Ren-Hua Chung
Ying-Erh Chen
A two-stage random forest-based pathway analysis method.
PLoS ONE
author_facet Ren-Hua Chung
Ying-Erh Chen
author_sort Ren-Hua Chung
title A two-stage random forest-based pathway analysis method.
title_short A two-stage random forest-based pathway analysis method.
title_full A two-stage random forest-based pathway analysis method.
title_fullStr A two-stage random forest-based pathway analysis method.
title_full_unstemmed A two-stage random forest-based pathway analysis method.
title_sort two-stage random forest-based pathway analysis method.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2012-01-01
description Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers.
url http://europepmc.org/articles/PMC3346727?pdf=render
work_keys_str_mv AT renhuachung atwostagerandomforestbasedpathwayanalysismethod
AT yingerhchen atwostagerandomforestbasedpathwayanalysismethod
AT renhuachung twostagerandomforestbasedpathwayanalysismethod
AT yingerhchen twostagerandomforestbasedpathwayanalysismethod
_version_ 1725015461448384512