A mixture copula Bayesian network model for multimodal genomic data

Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normal...

Full description

Bibliographic Details
Main Authors: Qingyang Zhang, Xuan Shi
Format: Article
Language:English
Published: SAGE Publishing 2017-04-01
Series:Cancer Informatics
Online Access:https://doi.org/10.1177/1176935117702389
id doaj-4d0f1fb086ff4356aa7ff6f0e4d3c766
record_format Article
spelling doaj-4d0f1fb086ff4356aa7ff6f0e4d3c7662020-11-25T03:33:36ZengSAGE PublishingCancer Informatics1176-93512017-04-011610.1177/117693511770238910.1177_1176935117702389A mixture copula Bayesian network model for multimodal genomic dataQingyang Zhang0Xuan Shi1Department of Mathematical Sciences, University of Arkansas, USADepartment of Geosciences, University of Arkansas, USAGaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.https://doi.org/10.1177/1176935117702389
collection DOAJ
language English
format Article
sources DOAJ
author Qingyang Zhang
Xuan Shi
spellingShingle Qingyang Zhang
Xuan Shi
A mixture copula Bayesian network model for multimodal genomic data
Cancer Informatics
author_facet Qingyang Zhang
Xuan Shi
author_sort Qingyang Zhang
title A mixture copula Bayesian network model for multimodal genomic data
title_short A mixture copula Bayesian network model for multimodal genomic data
title_full A mixture copula Bayesian network model for multimodal genomic data
title_fullStr A mixture copula Bayesian network model for multimodal genomic data
title_full_unstemmed A mixture copula Bayesian network model for multimodal genomic data
title_sort mixture copula bayesian network model for multimodal genomic data
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2017-04-01
description Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.
url https://doi.org/10.1177/1176935117702389
work_keys_str_mv AT qingyangzhang amixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT xuanshi amixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT qingyangzhang mixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT xuanshi mixturecopulabayesiannetworkmodelformultimodalgenomicdata
_version_ 1724562711945150464