Analysis of sequence conservation at nucleotide resolution.

One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting th...

Full description

Bibliographic Details
Main Authors: Saurabh Asthana, Mikhail Roytberg, John Stamatoyannopoulos, Shamil Sunyaev
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2007-12-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2230682?pdf=render
id doaj-7b4b3fbb83554107a2de82c065b1ee20
record_format Article
spelling doaj-7b4b3fbb83554107a2de82c065b1ee202020-11-25T02:19:18ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582007-12-01312e25410.1371/journal.pcbi.0030254Analysis of sequence conservation at nucleotide resolution.Saurabh AsthanaMikhail RoytbergJohn StamatoyannopoulosShamil SunyaevOne of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.http://europepmc.org/articles/PMC2230682?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Saurabh Asthana
Mikhail Roytberg
John Stamatoyannopoulos
Shamil Sunyaev
spellingShingle Saurabh Asthana
Mikhail Roytberg
John Stamatoyannopoulos
Shamil Sunyaev
Analysis of sequence conservation at nucleotide resolution.
PLoS Computational Biology
author_facet Saurabh Asthana
Mikhail Roytberg
John Stamatoyannopoulos
Shamil Sunyaev
author_sort Saurabh Asthana
title Analysis of sequence conservation at nucleotide resolution.
title_short Analysis of sequence conservation at nucleotide resolution.
title_full Analysis of sequence conservation at nucleotide resolution.
title_fullStr Analysis of sequence conservation at nucleotide resolution.
title_full_unstemmed Analysis of sequence conservation at nucleotide resolution.
title_sort analysis of sequence conservation at nucleotide resolution.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2007-12-01
description One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.
url http://europepmc.org/articles/PMC2230682?pdf=render
work_keys_str_mv AT saurabhasthana analysisofsequenceconservationatnucleotideresolution
AT mikhailroytberg analysisofsequenceconservationatnucleotideresolution
AT johnstamatoyannopoulos analysisofsequenceconservationatnucleotideresolution
AT shamilsunyaev analysisofsequenceconservationatnucleotideresolution
_version_ 1724876965672910848