VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder

Single-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq...

Full description

Bibliographic Details
Main Authors: Dongfang Wang, Jin Gu
Format: Article
Language:English
Published: Elsevier 2018-10-01
Series:Genomics, Proteomics & Bioinformatics
Online Access:http://www.sciencedirect.com/science/article/pii/S167202291830439X
id doaj-829ab2dc868347c2a2ffb2c2e2c4541c
record_format Article
spelling doaj-829ab2dc868347c2a2ffb2c2e2c4541c2020-11-24T23:58:07ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292018-10-01165320331VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational AutoencoderDongfang Wang0Jin Gu1MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, ChinaCorresponding author.; MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division & Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, ChinaSingle-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq data. At the single cell level, the transcriptional fluctuations are much larger than the average of a cell population, and the low amount of RNA transcripts will increase the rate of technical dropout events. Therefore, scRNA-seq data are much noisier than traditional bulk RNA-seq data. In this study, we proposed the deep variational autoencoder for scRNA-seq data (VASC), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. VASC can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on over 20 datasets, VASC shows superior performances in most cases and exhibits broader dataset compatibility compared to four state-of-the-art dimension reduction and visualization methods. In addition, VASC provides better representations for very rare cell populations in the 2D visualization. As a case study, VASC successfully re-establishes the cell dynamics in pre-implantation embryos and identifies several candidate marker genes associated with early embryo development. Moreover, VASC also performs well on a 10× Genomics dataset with more cells and higher dropout rate. Keywords: Single cell RNA sequencing, Deep variational autoencoder, Dimension reduction, Visualization, Dropouthttp://www.sciencedirect.com/science/article/pii/S167202291830439X
collection DOAJ
language English
format Article
sources DOAJ
author Dongfang Wang
Jin Gu
spellingShingle Dongfang Wang
Jin Gu
VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
Genomics, Proteomics & Bioinformatics
author_facet Dongfang Wang
Jin Gu
author_sort Dongfang Wang
title VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
title_short VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
title_full VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
title_fullStr VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
title_full_unstemmed VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder
title_sort vasc: dimension reduction and visualization of single-cell rna-seq data by deep variational autoencoder
publisher Elsevier
series Genomics, Proteomics & Bioinformatics
issn 1672-0229
publishDate 2018-10-01
description Single-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq data. At the single cell level, the transcriptional fluctuations are much larger than the average of a cell population, and the low amount of RNA transcripts will increase the rate of technical dropout events. Therefore, scRNA-seq data are much noisier than traditional bulk RNA-seq data. In this study, we proposed the deep variational autoencoder for scRNA-seq data (VASC), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. VASC can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on over 20 datasets, VASC shows superior performances in most cases and exhibits broader dataset compatibility compared to four state-of-the-art dimension reduction and visualization methods. In addition, VASC provides better representations for very rare cell populations in the 2D visualization. As a case study, VASC successfully re-establishes the cell dynamics in pre-implantation embryos and identifies several candidate marker genes associated with early embryo development. Moreover, VASC also performs well on a 10× Genomics dataset with more cells and higher dropout rate. Keywords: Single cell RNA sequencing, Deep variational autoencoder, Dimension reduction, Visualization, Dropout
url http://www.sciencedirect.com/science/article/pii/S167202291830439X
work_keys_str_mv AT dongfangwang vascdimensionreductionandvisualizationofsinglecellrnaseqdatabydeepvariationalautoencoder
AT jingu vascdimensionreductionandvisualizationofsinglecellrnaseqdatabydeepvariationalautoencoder
_version_ 1725451742964875264