First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_

Abstract Background Exploring cellular responses to stimuli using extensive gene expression profiles has become a routine procedure performed on a daily basis. Raw and processed data from these studies are available on public databases but the opportunity to fully exploit such rich datasets is limit...

Full description

Bibliographic Details
Main Authors: Marco Moretto, Paolo Sonego, Ana B. Villaseñor-Altamirano, Kristof Engelen
Format: Article
Language:English
Published: BMC 2019-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2643-6
id doaj-06e353c3ecb04cfa82237af96a1d04c4
record_format Article
spelling doaj-06e353c3ecb04cfa82237af96a1d04c42020-11-25T01:59:04ZengBMCBMC Bioinformatics1471-21052019-01-012011910.1186/s12859-019-2643-6First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_Marco Moretto0Paolo Sonego1Ana B. Villaseñor-Altamirano2Kristof Engelen3Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund MachUnit of Computational Biology, Research and Innovation Centre, Fondazione Edmund MachLaboratorio Internacional de Investigación Sobre el Genoma Humano, Universidad Nacional Autónoma De MéxicoUnit of Computational Biology, Research and Innovation Centre, Fondazione Edmund MachAbstract Background Exploring cellular responses to stimuli using extensive gene expression profiles has become a routine procedure performed on a daily basis. Raw and processed data from these studies are available on public databases but the opportunity to fully exploit such rich datasets is limited due to the large heterogeneity of data formats. In recent years, several approaches have been proposed to effectively integrate gene expression data for analysis and exploration at a broader level. Despite the different goals and approaches towards gene expression data integration, the first step is common to any proposed method: data acquisition. Although it is seemingly straightforward to extract valuable information from a set of downloaded files, things can rapidly get complicated, especially as the number of experiments grows. Transcriptomic datasets are deposited in public databases with little regard to data format and thus retrieving raw data might become a challenging task. While for RNA-seq experiments such problem is partially mitigated by the fact that raw reads are generally available on databases such as the NCBI SRA, for microarray experiments standards are not equally well established, or enforced during submission, and thus a multitude of data formats has emerged. Results COMMAND>_ is a specialized tool meant to simplify gene expression data acquisition. It is a flexible multi-user web-application that allows users to search and download gene expression experiments, extract only the relevant information from experiment files, re-annotate microarray platforms, and present data in a simple and coherent data model for subsequent analysis. Conclusions COMMAND>_ facilitates the creation of local datasets of gene expression data coming from both microarray and RNA-seq experiments and may be a more efficient tool to build integrated gene expression compendia. COMMAND>_ is free and open-source software, including publicly available tutorials and documentation.http://link.springer.com/article/10.1186/s12859-019-2643-6TranscriptomicGene expressionMicroarrayRna-seqCompendiaData integration
collection DOAJ
language English
format Article
sources DOAJ
author Marco Moretto
Paolo Sonego
Ana B. Villaseñor-Altamirano
Kristof Engelen
spellingShingle Marco Moretto
Paolo Sonego
Ana B. Villaseñor-Altamirano
Kristof Engelen
First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
BMC Bioinformatics
Transcriptomic
Gene expression
Microarray
Rna-seq
Compendia
Data integration
author_facet Marco Moretto
Paolo Sonego
Ana B. Villaseñor-Altamirano
Kristof Engelen
author_sort Marco Moretto
title First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
title_short First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
title_full First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
title_fullStr First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
title_full_unstemmed First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_
title_sort first step toward gene expression data integration: transcriptomic data acquisition with command>_
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-01-01
description Abstract Background Exploring cellular responses to stimuli using extensive gene expression profiles has become a routine procedure performed on a daily basis. Raw and processed data from these studies are available on public databases but the opportunity to fully exploit such rich datasets is limited due to the large heterogeneity of data formats. In recent years, several approaches have been proposed to effectively integrate gene expression data for analysis and exploration at a broader level. Despite the different goals and approaches towards gene expression data integration, the first step is common to any proposed method: data acquisition. Although it is seemingly straightforward to extract valuable information from a set of downloaded files, things can rapidly get complicated, especially as the number of experiments grows. Transcriptomic datasets are deposited in public databases with little regard to data format and thus retrieving raw data might become a challenging task. While for RNA-seq experiments such problem is partially mitigated by the fact that raw reads are generally available on databases such as the NCBI SRA, for microarray experiments standards are not equally well established, or enforced during submission, and thus a multitude of data formats has emerged. Results COMMAND>_ is a specialized tool meant to simplify gene expression data acquisition. It is a flexible multi-user web-application that allows users to search and download gene expression experiments, extract only the relevant information from experiment files, re-annotate microarray platforms, and present data in a simple and coherent data model for subsequent analysis. Conclusions COMMAND>_ facilitates the creation of local datasets of gene expression data coming from both microarray and RNA-seq experiments and may be a more efficient tool to build integrated gene expression compendia. COMMAND>_ is free and open-source software, including publicly available tutorials and documentation.
topic Transcriptomic
Gene expression
Microarray
Rna-seq
Compendia
Data integration
url http://link.springer.com/article/10.1186/s12859-019-2643-6
work_keys_str_mv AT marcomoretto firststeptowardgeneexpressiondataintegrationtranscriptomicdataacquisitionwithcommand
AT paolosonego firststeptowardgeneexpressiondataintegrationtranscriptomicdataacquisitionwithcommand
AT anabvillasenoraltamirano firststeptowardgeneexpressiondataintegrationtranscriptomicdataacquisitionwithcommand
AT kristofengelen firststeptowardgeneexpressiondataintegrationtranscriptomicdataacquisitionwithcommand
_version_ 1724966010293846016