MultiSeq: unifying sequence and structure data for evolutionary analysis

<p>Abstract</p> <p>Background</p> <p>Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acid...

Full description

Bibliographic Details
Main Authors: Wright Dan, Eargle John, Roberts Elijah, Luthey-Schulten Zaida
Format: Article
Language:English
Published: BMC 2006-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/382
id doaj-7279712ec8604f36b887750e9ff1a65d
record_format Article
spelling doaj-7279712ec8604f36b887750e9ff1a65d2020-11-24T20:59:25ZengBMCBMC Bioinformatics1471-21052006-08-017138210.1186/1471-2105-7-382MultiSeq: unifying sequence and structure data for evolutionary analysisWright DanEargle JohnRoberts ElijahLuthey-Schulten Zaida<p>Abstract</p> <p>Background</p> <p>Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes.</p> <p>Results</p> <p>Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins.</p> <p>Conclusion</p> <p>MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: <url>http://www.scs.uiuc.edu/~schulten/multiseq/</url></p> http://www.biomedcentral.com/1471-2105/7/382
collection DOAJ
language English
format Article
sources DOAJ
author Wright Dan
Eargle John
Roberts Elijah
Luthey-Schulten Zaida
spellingShingle Wright Dan
Eargle John
Roberts Elijah
Luthey-Schulten Zaida
MultiSeq: unifying sequence and structure data for evolutionary analysis
BMC Bioinformatics
author_facet Wright Dan
Eargle John
Roberts Elijah
Luthey-Schulten Zaida
author_sort Wright Dan
title MultiSeq: unifying sequence and structure data for evolutionary analysis
title_short MultiSeq: unifying sequence and structure data for evolutionary analysis
title_full MultiSeq: unifying sequence and structure data for evolutionary analysis
title_fullStr MultiSeq: unifying sequence and structure data for evolutionary analysis
title_full_unstemmed MultiSeq: unifying sequence and structure data for evolutionary analysis
title_sort multiseq: unifying sequence and structure data for evolutionary analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-08-01
description <p>Abstract</p> <p>Background</p> <p>Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes.</p> <p>Results</p> <p>Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins.</p> <p>Conclusion</p> <p>MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: <url>http://www.scs.uiuc.edu/~schulten/multiseq/</url></p>
url http://www.biomedcentral.com/1471-2105/7/382
work_keys_str_mv AT wrightdan multisequnifyingsequenceandstructuredataforevolutionaryanalysis
AT earglejohn multisequnifyingsequenceandstructuredataforevolutionaryanalysis
AT robertselijah multisequnifyingsequenceandstructuredataforevolutionaryanalysis
AT lutheyschultenzaida multisequnifyingsequenceandstructuredataforevolutionaryanalysis
_version_ 1716782457512001536