Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved]
Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We p...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
F1000 Research Ltd
2020-05-01
|
Series: | F1000Research |
Online Access: | https://f1000research.com/articles/9-199/v2 |
id |
doaj-79ccd3134f2845538ae9508202288026 |
---|---|
record_format |
Article |
spelling |
doaj-79ccd3134f2845538ae95082022880262020-11-25T01:23:19ZengF1000 Research LtdF1000Research2046-14022020-05-01910.12688/f1000research.22731.226547Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved]Valentina Giansanti0Ming Tang1Davide Cittaro2Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, ItalyFAS informatics, Harvard University, Cambridge, MA, USACenter for Omics Sciences, IRCCS San Raffaele Institute, Milan, ItalyBackground: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.https://f1000research.com/articles/9-199/v2 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Valentina Giansanti Ming Tang Davide Cittaro |
spellingShingle |
Valentina Giansanti Ming Tang Davide Cittaro Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] F1000Research |
author_facet |
Valentina Giansanti Ming Tang Davide Cittaro |
author_sort |
Valentina Giansanti |
title |
Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
title_short |
Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
title_full |
Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
title_fullStr |
Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
title_full_unstemmed |
Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
title_sort |
fast analysis of scatac-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved] |
publisher |
F1000 Research Ltd |
series |
F1000Research |
issn |
2046-1402 |
publishDate |
2020-05-01 |
description |
Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations. |
url |
https://f1000research.com/articles/9-199/v2 |
work_keys_str_mv |
AT valentinagiansanti fastanalysisofscatacseqdatausingapredefinedsetofgenomicregionsversion2peerreview2approved AT mingtang fastanalysisofscatacseqdatausingapredefinedsetofgenomicregionsversion2peerreview2approved AT davidecittaro fastanalysisofscatacseqdatausingapredefinedsetofgenomicregionsversion2peerreview2approved |
_version_ |
1725122989786136576 |