HetEnc: a deep learning predictive model for multi-type biological dataset
Abstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; ho...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-08-01
|
Series: | BMC Genomics |
Online Access: | http://link.springer.com/article/10.1186/s12864-019-5997-2 |
id |
doaj-c58e044c26104b39bdbf7d13c7faacff |
---|---|
record_format |
Article |
spelling |
doaj-c58e044c26104b39bdbf7d13c7faacff2020-11-25T03:36:03ZengBMCBMC Genomics1471-21642019-08-0120111010.1186/s12864-019-5997-2HetEnc: a deep learning predictive model for multi-type biological datasetLeihong Wu0Xiangwen Liu1Joshua Xu2Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationDivision of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationDivision of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationAbstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.http://link.springer.com/article/10.1186/s12864-019-5997-2 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Leihong Wu Xiangwen Liu Joshua Xu |
spellingShingle |
Leihong Wu Xiangwen Liu Joshua Xu HetEnc: a deep learning predictive model for multi-type biological dataset BMC Genomics |
author_facet |
Leihong Wu Xiangwen Liu Joshua Xu |
author_sort |
Leihong Wu |
title |
HetEnc: a deep learning predictive model for multi-type biological dataset |
title_short |
HetEnc: a deep learning predictive model for multi-type biological dataset |
title_full |
HetEnc: a deep learning predictive model for multi-type biological dataset |
title_fullStr |
HetEnc: a deep learning predictive model for multi-type biological dataset |
title_full_unstemmed |
HetEnc: a deep learning predictive model for multi-type biological dataset |
title_sort |
hetenc: a deep learning predictive model for multi-type biological dataset |
publisher |
BMC |
series |
BMC Genomics |
issn |
1471-2164 |
publishDate |
2019-08-01 |
description |
Abstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research. |
url |
http://link.springer.com/article/10.1186/s12864-019-5997-2 |
work_keys_str_mv |
AT leihongwu hetencadeeplearningpredictivemodelformultitypebiologicaldataset AT xiangwenliu hetencadeeplearningpredictivemodelformultitypebiologicaldataset AT joshuaxu hetencadeeplearningpredictivemodelformultitypebiologicaldataset |
_version_ |
1724551566747238400 |