A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data
The development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variabili...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9018223/ |
id |
doaj-b65ebfd2e99640c4a9da5e3ffaf06051 |
---|---|
record_format |
Article |
spelling |
doaj-b65ebfd2e99640c4a9da5e3ffaf060512021-03-30T02:07:13ZengIEEEIEEE Access2169-35362020-01-018423424234810.1109/ACCESS.2020.29771069018223A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq DataInuk Jung0https://orcid.org/0000-0003-0675-4244Joungmin Choi1https://orcid.org/0000-0003-2090-3330Heejoon Chae2https://orcid.org/0000-0002-0960-5829Department of Computer Science and Engineering, Kyungpook National University, Daegu, South KoreaDepartment of Computer Science, Sookmyung Women’s University, Seoul, South KoreaDepartment of Computer Science, Sookmyung Women’s University, Seoul, South KoreaThe development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variability of cells, detecting rare cell types, accurately predicting cell states and their localization. However, analyzing such large scale data, especially when they are sampled at multiple time points, brings new challenges in data mining informative genes, compared to single snapshot samples. It becomes even more complicated when gene expression patterns are to be mined from time-series scRNA-seq datasets generated from multiple conditions, which will constitute a data with gene, condition and time dimensions. Here, we focused on detecting gene expression patterns that well capture the underlying biological differences between time-series scRNA-seq datasets of three different types of stem cells. The gene expression profile of 2,128 time-series scRNA-seq samples from long-term hematopoietic stem cells (LT-HSC) and two of its progenitor cell types were analyzed using our framework. We have successfully detected condition specific feature genes that were able to achieve 90.03% classification accuracy between the three cell types. Investigating the genes and clusters detected by our framework, we found that cell cycle related genes showed significantly high variance between the three cell types. Such results and transcriptomic characters detected from our analysis were consistent with the original study. Collectively, the framework was able to successfully detect biological meaningful gene sets and expression patterns from multi-condition time-series scRNA-seq samples.https://ieeexplore.ieee.org/document/9018223/Gene expressionmulti-classsingle-celltime-series |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Inuk Jung Joungmin Choi Heejoon Chae |
spellingShingle |
Inuk Jung Joungmin Choi Heejoon Chae A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data IEEE Access Gene expression multi-class single-cell time-series |
author_facet |
Inuk Jung Joungmin Choi Heejoon Chae |
author_sort |
Inuk Jung |
title |
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data |
title_short |
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data |
title_full |
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data |
title_fullStr |
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data |
title_full_unstemmed |
A Non-Negative Matrix Factorization-Based Framework for the Analysis of Multi-Class Time-Series Single-Cell RNA-Seq Data |
title_sort |
non-negative matrix factorization-based framework for the analysis of multi-class time-series single-cell rna-seq data |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
The development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variability of cells, detecting rare cell types, accurately predicting cell states and their localization. However, analyzing such large scale data, especially when they are sampled at multiple time points, brings new challenges in data mining informative genes, compared to single snapshot samples. It becomes even more complicated when gene expression patterns are to be mined from time-series scRNA-seq datasets generated from multiple conditions, which will constitute a data with gene, condition and time dimensions. Here, we focused on detecting gene expression patterns that well capture the underlying biological differences between time-series scRNA-seq datasets of three different types of stem cells. The gene expression profile of 2,128 time-series scRNA-seq samples from long-term hematopoietic stem cells (LT-HSC) and two of its progenitor cell types were analyzed using our framework. We have successfully detected condition specific feature genes that were able to achieve 90.03% classification accuracy between the three cell types. Investigating the genes and clusters detected by our framework, we found that cell cycle related genes showed significantly high variance between the three cell types. Such results and transcriptomic characters detected from our analysis were consistent with the original study. Collectively, the framework was able to successfully detect biological meaningful gene sets and expression patterns from multi-condition time-series scRNA-seq samples. |
topic |
Gene expression multi-class single-cell time-series |
url |
https://ieeexplore.ieee.org/document/9018223/ |
work_keys_str_mv |
AT inukjung anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata AT joungminchoi anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata AT heejoonchae anonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata AT inukjung nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata AT joungminchoi nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata AT heejoonchae nonnegativematrixfactorizationbasedframeworkfortheanalysisofmulticlasstimeseriessinglecellrnaseqdata |
_version_ |
1724185774497202176 |