Generalizable and Scalable Visualization of Single-Cell Data Using Neural Networks

Visualization algorithms are fundamental tools for interpreting single-cell data. However, standard methods, such as t-stochastic neighbor embedding (t-SNE), are not scalable to datasets with millions of cells and the resulting visualizations cannot be generalized to analyze new datasets. Here we in...

Full description

Bibliographic Details
Main Authors: Cho, Hyunghoon (Author), Berger Leighton, Bonnie (Author), Peng, Jian (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Mathematics (Contributor)
Format: Article
Language:English
Published: Cell Press, 2019-11-08T13:33:04Z.
Subjects:
Online Access:Get fulltext
LEADER 02371 am a22002053u 4500
001 122802
042 |a dc 
100 1 0 |a Cho, Hyunghoon  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Mathematics  |e contributor 
700 1 0 |a Berger Leighton, Bonnie  |e author 
700 1 0 |a Peng, Jian  |e author 
245 0 0 |a Generalizable and Scalable Visualization of Single-Cell Data Using Neural Networks 
260 |b Cell Press,   |c 2019-11-08T13:33:04Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/122802 
520 |a Visualization algorithms are fundamental tools for interpreting single-cell data. However, standard methods, such as t-stochastic neighbor embedding (t-SNE), are not scalable to datasets with millions of cells and the resulting visualizations cannot be generalized to analyze new datasets. Here we introduce net-SNE, a generalizable visualization approach that trains a neural network to learn a mapping function from high-dimensional single-cell gene-expression profiles to a low-dimensional visualization. We benchmark net-SNE on 13 different datasets, and show that it achieves visualization quality and clustering accuracy comparable with t-SNE. Additionally we show that the mapping function learned by net-SNE can accurately position entire new subtypes of cells from previously unseen datasets and can also be used to reduce the runtime of visualizing 1.3 million cells by 36-fold (from 1.5 days to an hour). Our work provides a framework for bootstrapping single-cell analysis from existing datasets. Researchers are applying single-cell RNA sequencing to increasingly large numbers of cells in diverse tissues and organisms. We introduce a data visualization tool, named net-SNE, which trains a neural network to embed single cells in 2D or 3D. Unlike previous approaches, our method allows new cells to be mapped onto existing visualizations, facilitating knowledge transfer across different datasets. Our method also vastly reduces the runtime of visualizing large datasets containing millions of cells. Keywords: data visualization; neural network; single-cell RNA sequencing 
520 |a National Institutes of Health (U.S.) (Grant R01GM081871) 
546 |a en 
655 7 |a Article 
773 |t Cell Systems