GATA: a graphic alignment tool for comparative sequence analysis

<p>Abstract</p> <p>Background</p> <p>Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when...

Full description

Bibliographic Details
Main Authors: Nix David A, Eisen Michael B
Format: Article
Language:English
Published: BMC 2005-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/9
id doaj-519fae276eec4e86b52459d68d52580f
record_format Article
spelling doaj-519fae276eec4e86b52459d68d52580f2020-11-25T00:26:07ZengBMCBMC Bioinformatics1471-21052005-01-0161910.1186/1471-2105-6-9GATA: a graphic alignment tool for comparative sequence analysisNix David AEisen Michael B<p>Abstract</p> <p>Background</p> <p>Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.</p> <p>Results</p> <p>To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA <url>http://gata.sourceforge.net/</url>). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file.</p> <p>Conclusions</p> <p>GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0–200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.</p> http://www.biomedcentral.com/1471-2105/6/9
collection DOAJ
language English
format Article
sources DOAJ
author Nix David A
Eisen Michael B
spellingShingle Nix David A
Eisen Michael B
GATA: a graphic alignment tool for comparative sequence analysis
BMC Bioinformatics
author_facet Nix David A
Eisen Michael B
author_sort Nix David A
title GATA: a graphic alignment tool for comparative sequence analysis
title_short GATA: a graphic alignment tool for comparative sequence analysis
title_full GATA: a graphic alignment tool for comparative sequence analysis
title_fullStr GATA: a graphic alignment tool for comparative sequence analysis
title_full_unstemmed GATA: a graphic alignment tool for comparative sequence analysis
title_sort gata: a graphic alignment tool for comparative sequence analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-01-01
description <p>Abstract</p> <p>Background</p> <p>Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.</p> <p>Results</p> <p>To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA <url>http://gata.sourceforge.net/</url>). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file.</p> <p>Conclusions</p> <p>GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0–200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.</p>
url http://www.biomedcentral.com/1471-2105/6/9
work_keys_str_mv AT nixdavida gataagraphicalignmenttoolforcomparativesequenceanalysis
AT eisenmichaelb gataagraphicalignmenttoolforcomparativesequenceanalysis
_version_ 1725345922523594752