Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

Abstract Background The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as the...

Full description

Bibliographic Details
Main Authors: Shaoke Lou, Tianxiao Li, Daniel Spakowicz, Xiting Yan, Geoffrey Lowell Chupp, Mark Gerstein
Format: Article
Language:English
Published: BMC 2020-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03785-y
id doaj-3d7359749a0a4a5fa7ca97dd267509b1
record_format Article
spelling doaj-3d7359749a0a4a5fa7ca97dd267509b12020-11-25T03:53:57ZengBMCBMC Bioinformatics1471-21052020-10-0121111310.1186/s12859-020-03785-yLatent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patientsShaoke Lou0Tianxiao Li1Daniel Spakowicz2Xiting Yan3Geoffrey Lowell Chupp4Mark Gerstein5Program in Computational Biology and Bioinformatics, Yale UniversityProgram in Computational Biology and Bioinformatics, Yale UniversityProgram in Computational Biology and Bioinformatics, Yale UniversityPulmonary and Critical Care, Yale School of MedicinePulmonary and Critical Care, Yale School of MedicineProgram in Computational Biology and Bioinformatics, Yale UniversityAbstract Background The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. Results Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.http://link.springer.com/article/10.1186/s12859-020-03785-yAsthmaAsthma subtypesDenoising autoencoderBiomarkerNon-invasive
collection DOAJ
language English
format Article
sources DOAJ
author Shaoke Lou
Tianxiao Li
Daniel Spakowicz
Xiting Yan
Geoffrey Lowell Chupp
Mark Gerstein
spellingShingle Shaoke Lou
Tianxiao Li
Daniel Spakowicz
Xiting Yan
Geoffrey Lowell Chupp
Mark Gerstein
Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
BMC Bioinformatics
Asthma
Asthma subtypes
Denoising autoencoder
Biomarker
Non-invasive
author_facet Shaoke Lou
Tianxiao Li
Daniel Spakowicz
Xiting Yan
Geoffrey Lowell Chupp
Mark Gerstein
author_sort Shaoke Lou
title Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_short Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_full Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_fullStr Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_full_unstemmed Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_sort latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-10-01
description Abstract Background The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. Results Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.
topic Asthma
Asthma subtypes
Denoising autoencoder
Biomarker
Non-invasive
url http://link.springer.com/article/10.1186/s12859-020-03785-y
work_keys_str_mv AT shaokelou latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT tianxiaoli latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT danielspakowicz latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT xitingyan latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT geoffreylowellchupp latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT markgerstein latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
_version_ 1724475740199583744