A mixture copula Bayesian network model for multimodal genomic data
Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normal...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2017-04-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.1177/1176935117702389 |
id |
doaj-4d0f1fb086ff4356aa7ff6f0e4d3c766 |
---|---|
record_format |
Article |
spelling |
doaj-4d0f1fb086ff4356aa7ff6f0e4d3c7662020-11-25T03:33:36ZengSAGE PublishingCancer Informatics1176-93512017-04-011610.1177/117693511770238910.1177_1176935117702389A mixture copula Bayesian network model for multimodal genomic dataQingyang Zhang0Xuan Shi1Department of Mathematical Sciences, University of Arkansas, USADepartment of Geosciences, University of Arkansas, USAGaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.https://doi.org/10.1177/1176935117702389 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Qingyang Zhang Xuan Shi |
spellingShingle |
Qingyang Zhang Xuan Shi A mixture copula Bayesian network model for multimodal genomic data Cancer Informatics |
author_facet |
Qingyang Zhang Xuan Shi |
author_sort |
Qingyang Zhang |
title |
A mixture copula Bayesian network model for multimodal genomic data |
title_short |
A mixture copula Bayesian network model for multimodal genomic data |
title_full |
A mixture copula Bayesian network model for multimodal genomic data |
title_fullStr |
A mixture copula Bayesian network model for multimodal genomic data |
title_full_unstemmed |
A mixture copula Bayesian network model for multimodal genomic data |
title_sort |
mixture copula bayesian network model for multimodal genomic data |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2017-04-01 |
description |
Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer. |
url |
https://doi.org/10.1177/1176935117702389 |
work_keys_str_mv |
AT qingyangzhang amixturecopulabayesiannetworkmodelformultimodalgenomicdata AT xuanshi amixturecopulabayesiannetworkmodelformultimodalgenomicdata AT qingyangzhang mixturecopulabayesiannetworkmodelformultimodalgenomicdata AT xuanshi mixturecopulabayesiannetworkmodelformultimodalgenomicdata |
_version_ |
1724562711945150464 |