A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data

The development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variabili...

Full description

Bibliographic Details
Main Authors: Inuk Jung, Joungmin Choi, Heejoon Chae
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9018223/
id doaj-b65ebfd2e99640c4a9da5e3ffaf06051
record_format Article
spelling doaj-b65ebfd2e99640c4a9da5e3ffaf060512021-03-30T02:07:13ZengIEEEIEEE Access2169-35362020-01-018423424234810.1109/ACCESS.2020.29771069018223A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq DataInuk Jung0https://orcid.org/0000-0003-0675-4244Joungmin Choi1https://orcid.org/0000-0003-2090-3330Heejoon Chae2https://orcid.org/0000-0002-0960-5829Department of Computer Science and Engineering, Kyungpook National University, Daegu, South KoreaDepartment of Computer Science, Sookmyung Women’s University, Seoul, South KoreaDepartment of Computer Science, Sookmyung Women’s University, Seoul, South KoreaThe development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variability of cells, detecting rare cell types, accurately predicting cell states and their localization. However, analyzing such large scale data, especially when they are sampled at multiple time points, brings new challenges in data mining informative genes, compared to single snapshot samples. It becomes even more complicated when gene expression patterns are to be mined from time-series scRNA-seq datasets generated from multiple conditions, which will constitute a data with gene, condition and time dimensions. Here, we focused on detecting gene expression patterns that well capture the underlying biological differences between time-series scRNA-seq datasets of three different types of stem cells. The gene expression profile of 2,128 time-series scRNA-seq samples from long-term hematopoietic stem cells (LT-HSC) and two of its progenitor cell types were analyzed using our framework. We have successfully detected condition specific feature genes that were able to achieve 90.03% classification accuracy between the three cell types. Investigating the genes and clusters detected by our framework, we found that cell cycle related genes showed significantly high variance between the three cell types. Such results and transcriptomic characters detected from our analysis were consistent with the original study. Collectively, the framework was able to successfully detect biological meaningful gene sets and expression patterns from multi-condition time-series scRNA-seq samples.https://ieeexplore.ieee.org/document/9018223/Gene expressionmulti-classsingle-celltime-series
collection DOAJ
language English
format Article
sources DOAJ
author Inuk Jung
Joungmin Choi
Heejoon Chae
spellingShingle Inuk Jung
Joungmin Choi
Heejoon Chae
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
IEEE Access
Gene expression
multi-class
single-cell
time-series
author_facet Inuk Jung
Joungmin Choi
Heejoon Chae
author_sort Inuk Jung
title A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
title_short A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
title_full A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
title_fullStr A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
title_full_unstemmed A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
title_sort non-negative matrix factorization-based framework for the analysis of multi-class time-series single-cell rna-seq data
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description The development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variability of cells, detecting rare cell types, accurately predicting cell states and their localization. However, analyzing such large scale data, especially when they are sampled at multiple time points, brings new challenges in data mining informative genes, compared to single snapshot samples. It becomes even more complicated when gene expression patterns are to be mined from time-series scRNA-seq datasets generated from multiple conditions, which will constitute a data with gene, condition and time dimensions. Here, we focused on detecting gene expression patterns that well capture the underlying biological differences between time-series scRNA-seq datasets of three different types of stem cells. The gene expression profile of 2,128 time-series scRNA-seq samples from long-term hematopoietic stem cells (LT-HSC) and two of its progenitor cell types were analyzed using our framework. We have successfully detected condition specific feature genes that were able to achieve 90.03% classification accuracy between the three cell types. Investigating the genes and clusters detected by our framework, we found that cell cycle related genes showed significantly high variance between the three cell types. Such results and transcriptomic characters detected from our analysis were consistent with the original study. Collectively, the framework was able to successfully detect biological meaningful gene sets and expression patterns from multi-condition time-series scRNA-seq samples.
topic Gene expression
multi-class
single-cell
time-series
url https://ieeexplore.ieee.org/document/9018223/
work_keys_str_mv AT inukjung anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
AT joungminchoi anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
AT heejoonchae anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
AT inukjung nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
AT joungminchoi nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
AT heejoonchae nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata
_version_ 1724185774497202176