HetEnc: a deep learning predictive model for multi-type biological dataset

Abstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; ho...

Full description

Bibliographic Details
Main Authors: Leihong Wu, Xiangwen Liu, Joshua Xu
Format: Article
Language:English
Published: BMC 2019-08-01
Series:BMC Genomics
Online Access:http://link.springer.com/article/10.1186/s12864-019-5997-2
id doaj-c58e044c26104b39bdbf7d13c7faacff
record_format Article
spelling doaj-c58e044c26104b39bdbf7d13c7faacff2020-11-25T03:36:03ZengBMCBMC Genomics1471-21642019-08-0120111010.1186/s12864-019-5997-2HetEnc: a deep learning predictive model for multi-type biological datasetLeihong Wu0Xiangwen Liu1Joshua Xu2Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationDivision of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationDivision of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug AdministrationAbstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.http://link.springer.com/article/10.1186/s12864-019-5997-2
collection DOAJ
language English
format Article
sources DOAJ
author Leihong Wu
Xiangwen Liu
Joshua Xu
spellingShingle Leihong Wu
Xiangwen Liu
Joshua Xu
HetEnc: a deep learning predictive model for multi-type biological dataset
BMC Genomics
author_facet Leihong Wu
Xiangwen Liu
Joshua Xu
author_sort Leihong Wu
title HetEnc: a deep learning predictive model for multi-type biological dataset
title_short HetEnc: a deep learning predictive model for multi-type biological dataset
title_full HetEnc: a deep learning predictive model for multi-type biological dataset
title_fullStr HetEnc: a deep learning predictive model for multi-type biological dataset
title_full_unstemmed HetEnc: a deep learning predictive model for multi-type biological dataset
title_sort hetenc: a deep learning predictive model for multi-type biological dataset
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2019-08-01
description Abstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.
url http://link.springer.com/article/10.1186/s12864-019-5997-2
work_keys_str_mv AT leihongwu hetencadeeplearningpredictivemodelformultitypebiologicaldataset
AT xiangwenliu hetencadeeplearningpredictivemodelformultitypebiologicaldataset
AT joshuaxu hetencadeeplearningpredictivemodelformultitypebiologicaldataset
_version_ 1724551566747238400