Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses...

Full description

Bibliographic Details
Main Authors: Chihyun Park, Jaegyoon Ahn, Hyunjin Kim, Sanghyun Park
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3908883?pdf=render
id doaj-834bfe30a97449ecb1c77bb15cb08b61
record_format Article
spelling doaj-834bfe30a97449ecb1c77bb15cb08b612020-11-25T02:09:16ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0191e8630910.1371/journal.pone.0086309Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.Chihyun ParkJaegyoon AhnHyunjin KimSanghyun ParkBACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.http://europepmc.org/articles/PMC3908883?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Chihyun Park
Jaegyoon Ahn
Hyunjin Kim
Sanghyun Park
spellingShingle Chihyun Park
Jaegyoon Ahn
Hyunjin Kim
Sanghyun Park
Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
PLoS ONE
author_facet Chihyun Park
Jaegyoon Ahn
Hyunjin Kim
Sanghyun Park
author_sort Chihyun Park
title Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
title_short Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
title_full Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
title_fullStr Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
title_full_unstemmed Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
title_sort integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
url http://europepmc.org/articles/PMC3908883?pdf=render
work_keys_str_mv AT chihyunpark integrativegenenetworkconstructiontoanalyzecancerrecurrenceusingsemisupervisedlearning
AT jaegyoonahn integrativegenenetworkconstructiontoanalyzecancerrecurrenceusingsemisupervisedlearning
AT hyunjinkim integrativegenenetworkconstructiontoanalyzecancerrecurrenceusingsemisupervisedlearning
AT sanghyunpark integrativegenenetworkconstructiontoanalyzecancerrecurrenceusingsemisupervisedlearning
_version_ 1724924880427679744