Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

Abstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures suc...

Full description

Bibliographic Details
Main Authors: F. William Townes, Stephanie C. Hicks, Martin J. Aryee, Rafael A. Irizarry
Format: Article
Language:English
Published: BMC 2019-12-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-019-1861-6
id doaj-af287f725372438e966f4cc53d8c0d74
record_format Article
spelling doaj-af287f725372438e966f4cc53d8c0d742020-12-27T12:20:13ZengBMCGenome Biology1474-760X2019-12-0120111610.1186/s13059-019-1861-6Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial modelF. William Townes0Stephanie C. Hicks1Martin J. Aryee2Rafael A. Irizarry3Department of Biostatistics, Harvard UniversityDepartment of Biostatistics, Johns Hopkins UniversityDepartment of Biostatistics, Harvard UniversityDepartment of Biostatistics, Harvard UniversityAbstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.https://doi.org/10.1186/s13059-019-1861-6Gene expressionSingle cellRNA-SeqDimension reductionVariable genesPrincipal component analysis
collection DOAJ
language English
format Article
sources DOAJ
author F. William Townes
Stephanie C. Hicks
Martin J. Aryee
Rafael A. Irizarry
spellingShingle F. William Townes
Stephanie C. Hicks
Martin J. Aryee
Rafael A. Irizarry
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
Genome Biology
Gene expression
Single cell
RNA-Seq
Dimension reduction
Variable genes
Principal component analysis
author_facet F. William Townes
Stephanie C. Hicks
Martin J. Aryee
Rafael A. Irizarry
author_sort F. William Townes
title Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
title_short Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
title_full Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
title_fullStr Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
title_full_unstemmed Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
title_sort feature selection and dimension reduction for single-cell rna-seq based on a multinomial model
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-12-01
description Abstract Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.
topic Gene expression
Single cell
RNA-Seq
Dimension reduction
Variable genes
Principal component analysis
url https://doi.org/10.1186/s13059-019-1861-6
work_keys_str_mv AT fwilliamtownes featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel
AT stephaniechicks featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel
AT martinjaryee featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel
AT rafaelairizarry featureselectionanddimensionreductionforsinglecellrnaseqbasedonamultinomialmodel
_version_ 1724369198772125696