An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

<p>Abstract</p> <p>Background</p> <p>The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotat...

Full description

Bibliographic Details
Main Authors: Matthews Benjamin F, Tremblay Arianne, Hosseini Parsa, Alkharouf Nadim W
Format: Article
Language:English
Published: BMC 2010-07-01
Series:BMC Research Notes
Online Access:http://www.biomedcentral.com/1756-0500/3/183
id doaj-9d02b8869392488387e2ca4c7fb79602
record_format Article
spelling doaj-9d02b8869392488387e2ca4c7fb796022020-11-25T01:54:57ZengBMCBMC Research Notes1756-05002010-07-013118310.1186/1756-0500-3-183An efficient annotation and gene-expression derivation tool for Illumina Solexa datasetsMatthews Benjamin FTremblay ArianneHosseini ParsaAlkharouf Nadim W<p>Abstract</p> <p>Background</p> <p>The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value.</p> <p>Findings</p> <p>We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations.</p> <p>Conclusions</p> <p>TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.</p> http://www.biomedcentral.com/1756-0500/3/183
collection DOAJ
language English
format Article
sources DOAJ
author Matthews Benjamin F
Tremblay Arianne
Hosseini Parsa
Alkharouf Nadim W
spellingShingle Matthews Benjamin F
Tremblay Arianne
Hosseini Parsa
Alkharouf Nadim W
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
BMC Research Notes
author_facet Matthews Benjamin F
Tremblay Arianne
Hosseini Parsa
Alkharouf Nadim W
author_sort Matthews Benjamin F
title An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_short An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_full An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_fullStr An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_full_unstemmed An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_sort efficient annotation and gene-expression derivation tool for illumina solexa datasets
publisher BMC
series BMC Research Notes
issn 1756-0500
publishDate 2010-07-01
description <p>Abstract</p> <p>Background</p> <p>The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value.</p> <p>Findings</p> <p>We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations.</p> <p>Conclusions</p> <p>TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.</p>
url http://www.biomedcentral.com/1756-0500/3/183
work_keys_str_mv AT matthewsbenjaminf anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT tremblayarianne anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT hosseiniparsa anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT alkharoufnadimw anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT matthewsbenjaminf efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT tremblayarianne efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT hosseiniparsa efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
AT alkharoufnadimw efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets
_version_ 1724986049841594368