A general index for linear and nonlinear correlations for high dimensional genomic data

Abstract Background With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data...

Full description

Bibliographic Details
Main Authors: Zhihao Yao, Jing Zhang, Xiufen Zou
Format: Article
Language:English
Published: BMC 2020-11-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-020-07246-x
id doaj-60591531790a4f0bacebabbdf650ee4f
record_format Article
spelling doaj-60591531790a4f0bacebabbdf650ee4f2020-12-06T12:25:32ZengBMCBMC Genomics1471-21642020-11-0121111410.1186/s12864-020-07246-xA general index for linear and nonlinear correlations for high dimensional genomic dataZhihao Yao0Jing Zhang1Xiufen Zou2School of Mathematics and Statistics, Wuhan UniversitySchool of Mathematics and Statistics, Wuhan UniversitySchool of Mathematics and Statistics, Wuhan UniversityAbstract Background With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. Results We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. Conclusions We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.https://doi.org/10.1186/s12864-020-07246-xHigh-dimensional dataNonlinear correlationRV-coefficient
collection DOAJ
language English
format Article
sources DOAJ
author Zhihao Yao
Jing Zhang
Xiufen Zou
spellingShingle Zhihao Yao
Jing Zhang
Xiufen Zou
A general index for linear and nonlinear correlations for high dimensional genomic data
BMC Genomics
High-dimensional data
Nonlinear correlation
RV-coefficient
author_facet Zhihao Yao
Jing Zhang
Xiufen Zou
author_sort Zhihao Yao
title A general index for linear and nonlinear correlations for high dimensional genomic data
title_short A general index for linear and nonlinear correlations for high dimensional genomic data
title_full A general index for linear and nonlinear correlations for high dimensional genomic data
title_fullStr A general index for linear and nonlinear correlations for high dimensional genomic data
title_full_unstemmed A general index for linear and nonlinear correlations for high dimensional genomic data
title_sort general index for linear and nonlinear correlations for high dimensional genomic data
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2020-11-01
description Abstract Background With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. Results We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. Conclusions We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.
topic High-dimensional data
Nonlinear correlation
RV-coefficient
url https://doi.org/10.1186/s12864-020-07246-x
work_keys_str_mv AT zhihaoyao ageneralindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
AT jingzhang ageneralindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
AT xiufenzou ageneralindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
AT zhihaoyao generalindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
AT jingzhang generalindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
AT xiufenzou generalindexforlinearandnonlinearcorrelationsforhighdimensionalgenomicdata
_version_ 1724398848935198720