scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide...

Full description

Bibliographic Details
Main Authors: Bobby Ranjan, Florian Schmidt, Wenjie Sun, Jinyu Park, Mohammad Amin Honardoost, Joanna Tan, Nirmala Arul Rayan, Shyam Prabhakar
Format: Article
Language:English
Published: BMC 2021-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04028-4
id doaj-67a08cdd0d1742028e2d238ecb65a93c
record_format Article
spelling doaj-67a08cdd0d1742028e2d238ecb65a93c2021-04-18T11:51:44ZengBMCBMC Bioinformatics1471-21052021-04-0122111510.1186/s12859-021-04028-4scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing dataBobby Ranjan0Florian Schmidt1Wenjie Sun2Jinyu Park3Mohammad Amin Honardoost4Joanna Tan5Nirmala Arul Rayan6Shyam Prabhakar7Laboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeLaboratory of Systems Biology and Data Analytics, Genome Institute of SingaporeAbstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus .https://doi.org/10.1186/s12859-021-04028-4ScRNA-seqClusteringCell type annotationConsensus method
collection DOAJ
language English
format Article
sources DOAJ
author Bobby Ranjan
Florian Schmidt
Wenjie Sun
Jinyu Park
Mohammad Amin Honardoost
Joanna Tan
Nirmala Arul Rayan
Shyam Prabhakar
spellingShingle Bobby Ranjan
Florian Schmidt
Wenjie Sun
Jinyu Park
Mohammad Amin Honardoost
Joanna Tan
Nirmala Arul Rayan
Shyam Prabhakar
scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
BMC Bioinformatics
ScRNA-seq
Clustering
Cell type annotation
Consensus method
author_facet Bobby Ranjan
Florian Schmidt
Wenjie Sun
Jinyu Park
Mohammad Amin Honardoost
Joanna Tan
Nirmala Arul Rayan
Shyam Prabhakar
author_sort Bobby Ranjan
title scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_short scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_full scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_fullStr scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_full_unstemmed scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_sort scconsensus: combining supervised and unsupervised clustering for cell type identification in single-cell rna sequencing data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-04-01
description Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus .
topic ScRNA-seq
Clustering
Cell type annotation
Consensus method
url https://doi.org/10.1186/s12859-021-04028-4
work_keys_str_mv AT bobbyranjan scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT florianschmidt scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT wenjiesun scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT jinyupark scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT mohammadaminhonardoost scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT joannatan scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT nirmalaarulrayan scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT shyamprabhakar scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
_version_ 1721521826336604160