Gene finding in Populus- The Bioinformatics of a EST program
During the course of the work described in this thesis, ESTsequencing project was initiated in woody plant poplar forlarge scale gene discovery in a tree specie. Since most of thetree genomes are considerably large in size, EST sequencing wasconsidered to be a cost effective method for gene discover...
Main Author: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
KTH, Bioteknologi
2003
|
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3614 http://nbn-resolving.de/urn:isbn:91-7283-582-6 |
Summary: | During the course of the work described in this thesis, ESTsequencing project was initiated in woody plant poplar forlarge scale gene discovery in a tree specie. Since most of thetree genomes are considerably large in size, EST sequencing wasconsidered to be a cost effective method for gene discovery.Initially, EST sequencing was performed on cDNA librariesprepared with mRNA isolated from woody tissues. This was firstsuch attempt to identify genes associated with the process ofwood formation on a large scale. At this stage, most of thesequence data was manually curated and a database wasestablished for storage of sequence information and making itavailable to biologist. One of the observations of this initialwork on sequencing of woody tissue cDNA libraries was the largenumber of ESTs displaying no similarity to any known gene inthe database prompting the suggestion regarding the existenceof tree specific genes. Subsequently, over 130,000 ESTs weresequenced from 19 different cDNA libraries and a set of perlscripts were written to facilitate efficient and automatedprocedure for data handling. In this automated pipeline, theoutput from sequencing machines as raw sequences would besubjected to quality control, blast and annotation processeseventually leading to storage of this data in an easilyaccessible, internet based database. The data from sequencingof cDNA libraries from leaves at two different stages ofdevelopment, namely, actively growing young poplar leaves andsenescing leaves was used to identify the changes in geneexpression occurring during the induction of senescence. Inorder to identify genes that may be involved in wood formationand to investigate whether there maybe tree specific genes, abioinformatics approach was taken by comparing EST compositionof cDNA libraries prepared from woody tissues from poplar,birch and pine and compared with the ESTs and genomic sequenceof Arabidopsis which under normal circumstances does not formwood. This comparison lead to the conclusion that there may befew if any tree specific genes finally using this data we couldidentify a set of genes whose expression was significantlyupregulated in woody tissues. Finally assembly of 130,000 ESTswas performed to obtain a glimpse into the genetic compositionof trees and this data was compared with that of Arabidopsisgenomic sequence to get a better understanding of thesimilarities and differences between trees and annuals at themolecular level. |
---|