Constraints on the organization and information properties of DNA sequences

In an investigation which concentrated primarily on the two completely sequenced chloroplast genomes, one from a tobacco and one from a liverwort, an attempt has been -made to discover some of the factors which produce order in DNA sequences. This was done by 1. looking in detail at doublet organiza...

Full description

Bibliographic Details
Main Author: Sibbald, Peter Ramsay
Language:English
Published: University of British Columbia 2010
Online Access:http://hdl.handle.net/2429/29284
id ndltd-UBC-oai-circle.library.ubc.ca-2429-29284
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-292842018-01-05T17:45:08Z Constraints on the organization and information properties of DNA sequences Sibbald, Peter Ramsay In an investigation which concentrated primarily on the two completely sequenced chloroplast genomes, one from a tobacco and one from a liverwort, an attempt has been -made to discover some of the factors which produce order in DNA sequences. This was done by 1. looking in detail at doublet organization throughout the genomes, 2'. by examining the ability of different methods to predict the existence of genes, based only on sequence organization and 3. by employing information theorj' to explore various levels of ordering in these sequences. The doublet analysis was performed on seven categories of DNA: tDNA, rDNA, ribosomal proteins, open reading frames not known to be genes (URF), other protein genes, non coding regions and introns. The rDNA has the most unusual doublet properties of all categories although all categories have, to a considerable extent, similar doublet properties. I suggest that these particular doublet properties facilitate accurate replication of the genome. In addition it appears that doublets which have certain thermodynamic properties are more abundant that others, suggesting that there is a selection pressure at the level of doublets for certain thermodynamic properties. Nussinov's hypothesis, that complementary doublets have similar relative abundances due to inverted duplication events has been tested and would not seem to explain the phenomenon. Fickett's method to predict whether URFs are genes was more successful than Sheperd's method. Fickett's method was modified for use on the chloroplast genomes and its rate of successful prediction increased substantially. This modified method will be useful for other chloroplast genomes as they are sequenced and also supports Fickett's contention that the method could be improved for use on specific groups. The ability to predict genes based only on sequence data shows that the requirement to code for protein exerts a detectable amount of order on the gene sequence and that this order is distinguishable from the order in non coding regions. Nearly all URFs greater than 200 base pairs in both plants are predicted to be genes. Informational analysis showed that most order is at the level of single and double bases with a significant, lesser amount of order at the triplet and 4-plet level. This was true for both coding and noncoding regions in both plants. This is in contrast earlier work (Rowe and Trainor) which found that in viruses there was a significant difference between 4-plet ordering in coding and noncoding regions. It is suggested that DNA may be optimized for replication rather than protein production. Several new problems and experiments have been suggested. Science, Faculty of Botany, Department of Graduate 2010-10-18T17:26:17Z 2010-10-18T17:26:17Z 1988 Text Thesis/Dissertation http://hdl.handle.net/2429/29284 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. University of British Columbia
collection NDLTD
language English
sources NDLTD
description In an investigation which concentrated primarily on the two completely sequenced chloroplast genomes, one from a tobacco and one from a liverwort, an attempt has been -made to discover some of the factors which produce order in DNA sequences. This was done by 1. looking in detail at doublet organization throughout the genomes, 2'. by examining the ability of different methods to predict the existence of genes, based only on sequence organization and 3. by employing information theorj' to explore various levels of ordering in these sequences. The doublet analysis was performed on seven categories of DNA: tDNA, rDNA, ribosomal proteins, open reading frames not known to be genes (URF), other protein genes, non coding regions and introns. The rDNA has the most unusual doublet properties of all categories although all categories have, to a considerable extent, similar doublet properties. I suggest that these particular doublet properties facilitate accurate replication of the genome. In addition it appears that doublets which have certain thermodynamic properties are more abundant that others, suggesting that there is a selection pressure at the level of doublets for certain thermodynamic properties. Nussinov's hypothesis, that complementary doublets have similar relative abundances due to inverted duplication events has been tested and would not seem to explain the phenomenon. Fickett's method to predict whether URFs are genes was more successful than Sheperd's method. Fickett's method was modified for use on the chloroplast genomes and its rate of successful prediction increased substantially. This modified method will be useful for other chloroplast genomes as they are sequenced and also supports Fickett's contention that the method could be improved for use on specific groups. The ability to predict genes based only on sequence data shows that the requirement to code for protein exerts a detectable amount of order on the gene sequence and that this order is distinguishable from the order in non coding regions. Nearly all URFs greater than 200 base pairs in both plants are predicted to be genes. Informational analysis showed that most order is at the level of single and double bases with a significant, lesser amount of order at the triplet and 4-plet level. This was true for both coding and noncoding regions in both plants. This is in contrast earlier work (Rowe and Trainor) which found that in viruses there was a significant difference between 4-plet ordering in coding and noncoding regions. It is suggested that DNA may be optimized for replication rather than protein production. Several new problems and experiments have been suggested. === Science, Faculty of === Botany, Department of === Graduate
author Sibbald, Peter Ramsay
spellingShingle Sibbald, Peter Ramsay
Constraints on the organization and information properties of DNA sequences
author_facet Sibbald, Peter Ramsay
author_sort Sibbald, Peter Ramsay
title Constraints on the organization and information properties of DNA sequences
title_short Constraints on the organization and information properties of DNA sequences
title_full Constraints on the organization and information properties of DNA sequences
title_fullStr Constraints on the organization and information properties of DNA sequences
title_full_unstemmed Constraints on the organization and information properties of DNA sequences
title_sort constraints on the organization and information properties of dna sequences
publisher University of British Columbia
publishDate 2010
url http://hdl.handle.net/2429/29284
work_keys_str_mv AT sibbaldpeterramsay constraintsontheorganizationandinformationpropertiesofdnasequences
_version_ 1718593873629937664