Sequence determinants in human polyadenylation site selection

<p>Abstract</p> <p>Background</p> <p>Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its...

Full description

Bibliographic Details
Main Authors: Gautheret Daniel, Legendre Matthieu
Format: Article
Language:English
Published: BMC 2003-02-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/4/7
id doaj-37153118a5aa4f5c8834f7807dc16639
record_format Article
spelling doaj-37153118a5aa4f5c8834f7807dc166392020-11-25T02:26:20ZengBMCBMC Genomics1471-21642003-02-0141710.1186/1471-2164-4-7Sequence determinants in human polyadenylation site selectionGautheret DanielLegendre Matthieu<p>Abstract</p> <p>Background</p> <p>Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals.</p> <p>Results</p> <p>We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%.</p> <p>Conclusion</p> <p>The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.</p> http://www.biomedcentral.com/1471-2164/4/7
collection DOAJ
language English
format Article
sources DOAJ
author Gautheret Daniel
Legendre Matthieu
spellingShingle Gautheret Daniel
Legendre Matthieu
Sequence determinants in human polyadenylation site selection
BMC Genomics
author_facet Gautheret Daniel
Legendre Matthieu
author_sort Gautheret Daniel
title Sequence determinants in human polyadenylation site selection
title_short Sequence determinants in human polyadenylation site selection
title_full Sequence determinants in human polyadenylation site selection
title_fullStr Sequence determinants in human polyadenylation site selection
title_full_unstemmed Sequence determinants in human polyadenylation site selection
title_sort sequence determinants in human polyadenylation site selection
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2003-02-01
description <p>Abstract</p> <p>Background</p> <p>Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals.</p> <p>Results</p> <p>We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%.</p> <p>Conclusion</p> <p>The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.</p>
url http://www.biomedcentral.com/1471-2164/4/7
work_keys_str_mv AT gautheretdaniel sequencedeterminantsinhumanpolyadenylationsiteselection
AT legendrematthieu sequencedeterminantsinhumanpolyadenylationsiteselection
_version_ 1724847763651297280