Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
Abstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures suc...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-019-1861-6 |
id |
doaj-af287f725372438e966f4cc53d8c0d74 |
---|---|
record_format |
Article |
spelling |
doaj-af287f725372438e966f4cc53d8c0d742020-12-27T12:20:13ZengBMCGenome Biology1474-760X2019-12-0120111610.1186/s13059-019-1861-6Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial modelF. William Townes0Stephanie C. Hicks1Martin J. Aryee2Rafael A. Irizarry3Department of Biostatistics, Harvard UniversityDepartment of Biostatistics, Johns Hopkins UniversityDepartment of Biostatistics, Harvard UniversityDepartment of Biostatistics, Harvard UniversityAbstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.https://doi.org/10.1186/s13059-019-1861-6Gene expressionSingle cellRNA-SeqDimension reductionVariable genesPrincipal component analysis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
F. William Townes Stephanie C. Hicks Martin J. Aryee Rafael A. Irizarry |
spellingShingle |
F. William Townes Stephanie C. Hicks Martin J. Aryee Rafael A. Irizarry Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model Genome Biology Gene expression Single cell RNA-Seq Dimension reduction Variable genes Principal component analysis |
author_facet |
F. William Townes Stephanie C. Hicks Martin J. Aryee Rafael A. Irizarry |
author_sort |
F. William Townes |
title |
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model |
title_short |
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model |
title_full |
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model |
title_fullStr |
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model |
title_full_unstemmed |
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model |
title_sort |
feature selection and dimension reduction for single-cell rna-seq based on a multinomial model |
publisher |
BMC |
series |
Genome Biology |
issn |
1474-760X |
publishDate |
2019-12-01 |
description |
Abstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets. |
topic |
Gene expression Single cell RNA-Seq Dimension reduction Variable genes Principal component analysis |
url |
https://doi.org/10.1186/s13059-019-1861-6 |
work_keys_str_mv |
AT fwilliamtownes featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel AT stephaniechicks featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel AT martinjaryee featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel AT rafaelairizarry featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel |
_version_ |
1724369198772125696 |