GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure

Background: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositor...

Full description

Bibliographic Details
Main Authors: Bender, M.R (Author), Biggs, T.D (Author), Feltus, F.A (Author), Ficklin, S.P (Author), Hadish, J.A (Author), Honaas, L. (Author), McKnight, C.B (Author), Shealy, B.T (Author), Smith, M.C (Author), Wytko, C. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2022
Subjects:
RNA
Online Access:View Fulltext in Publisher
LEADER 03611nam a2200601Ia 4500
001 10.1186-s12859-022-04629-7
008 220706s2022 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure 
260 0 |b BioMed Central Ltd  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-022-04629-7 
520 3 |a Background: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results: GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions: Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions. © 2022, The Author(s). 
650 0 4 |a article 
650 0 4 |a Co-expression networks 
650 0 4 |a differential gene expression 
650 0 4 |a Differential gene expression 
650 0 4 |a Differential gene expressions 
650 0 4 |a Digital storage 
650 0 4 |a Gene co-expression network 
650 0 4 |a Gene co-expression network 
650 0 4 |a gene expression 
650 0 4 |a Gene expression 
650 0 4 |a Gene expression matrix 
650 0 4 |a Gene expression matrix 
650 0 4 |a Genes expression 
650 0 4 |a Information management 
650 0 4 |a information storage 
650 0 4 |a matrix 
650 0 4 |a Nextflow 
650 0 4 |a Nextflow 
650 0 4 |a Program processors 
650 0 4 |a protein expression 
650 0 4 |a Reusability 
650 0 4 |a RNA 
650 0 4 |a RNA sequencing 
650 0 4 |a RNA-seq 
650 0 4 |a RNA-seq 
650 0 4 |a RNA-Seq datum 
650 0 4 |a software 
650 0 4 |a workflow 
650 0 4 |a Workflows 
650 0 4 |a Work-flows 
700 1 0 |a Bender, M.R.  |e author 
700 1 0 |a Biggs, T.D.  |e author 
700 1 0 |a Feltus, F.A.  |e author 
700 1 0 |a Ficklin, S.P.  |e author 
700 1 0 |a Hadish, J.A.  |e author 
700 1 0 |a Honaas, L.  |e author 
700 1 0 |a McKnight, C.B.  |e author 
700 1 0 |a Shealy, B.T.  |e author 
700 1 0 |a Smith, M.C.  |e author 
700 1 0 |a Wytko, C.  |e author 
773 |t BMC Bioinformatics