Summary: | <p>Abstract</p> <p>Background</p> <p>Many RNA viruses do not have a single, representative genome but instead form a set of related variants that has been called a quasispecies. The sequence variability of such viruses presents a significant bioinformatics challenge. In order for the sequence information to be understood, the complete mutational spectrum needs to be distilled to a biologically relevant and analyzable representation.</p> <p>Results</p> <p>Here, we develop a "selection mapping" algorithm--QUASI--that identifies the positively selected variants of viral proteins. The key to the selection mapping algorithm is the identification of particular replacement mutations that are overabundant relative to silent mutations at each codon (<it>e.g.</it>, threonine at hemagglutinin position 262). Selection mapping identifies such replacement mutations as positively selected. Conversely, selection mapping recognizes negatively selected variants as mutational "noise" (<it>e.g.</it>, serine at hemagglutinin position 262).</p> <p>Conclusion</p> <p>Selection mapping is a fundamental improvement over earlier methods (<it>e.g.</it>, dN/dS) that identify positive selection at codons but do not identify which amino acids at these codons confer selective advantage. Using QUASI's selection maps, we characterize the selected mutational landscapes of influenza A H3 hemagglutinin, HIV-1 reverse transcriptase, and HIV-1 gp120.</p>
|