First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissu...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Oxford University Press
2016-11-01
|
Series: | G3: Genes, Genomes, Genetics |
Subjects: | |
Online Access: | http://g3journal.org/lookup/doi/10.1534/g3.116.030411 |
id |
doaj-e736641342ed4419951d41e54c4cdd2b |
---|---|
record_format |
Article |
spelling |
doaj-e736641342ed4419951d41e54c4cdd2b2021-07-02T05:46:18ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362016-11-016113485349510.1534/g3.116.0304115First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)Victoria L. SorkSorel T. Fitz-GibbonDaniela PuiuMarc CrepeauPaul F. GuggerRachel ShermanKristian StevensCharles H. LangleyMatteo PellegriniSteven L. SalzbergOak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.http://g3journal.org/lookup/doi/10.1534/g3.116.030411adaptationannotationchloroplastnuclear genome assemblyQuercusGenPredShared Data ResourcesGenomic Selection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Victoria L. Sork Sorel T. Fitz-Gibbon Daniela Puiu Marc Crepeau Paul F. Gugger Rachel Sherman Kristian Stevens Charles H. Langley Matteo Pellegrini Steven L. Salzberg |
spellingShingle |
Victoria L. Sork Sorel T. Fitz-Gibbon Daniela Puiu Marc Crepeau Paul F. Gugger Rachel Sherman Kristian Stevens Charles H. Langley Matteo Pellegrini Steven L. Salzberg First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) G3: Genes, Genomes, Genetics adaptation annotation chloroplast nuclear genome assembly Quercus GenPred Shared Data Resources Genomic Selection |
author_facet |
Victoria L. Sork Sorel T. Fitz-Gibbon Daniela Puiu Marc Crepeau Paul F. Gugger Rachel Sherman Kristian Stevens Charles H. Langley Matteo Pellegrini Steven L. Salzberg |
author_sort |
Victoria L. Sork |
title |
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) |
title_short |
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) |
title_full |
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) |
title_fullStr |
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) |
title_full_unstemmed |
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae) |
title_sort |
first draft assembly and annotation of the genome of a california endemic oak quercus lobata née (fagaceae) |
publisher |
Oxford University Press |
series |
G3: Genes, Genomes, Genetics |
issn |
2160-1836 |
publishDate |
2016-11-01 |
description |
Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices. |
topic |
adaptation annotation chloroplast nuclear genome assembly Quercus GenPred Shared Data Resources Genomic Selection |
url |
http://g3journal.org/lookup/doi/10.1534/g3.116.030411 |
work_keys_str_mv |
AT victorialsork firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT soreltfitzgibbon firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT danielapuiu firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT marccrepeau firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT paulfgugger firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT rachelsherman firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT kristianstevens firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT charleshlangley firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT matteopellegrini firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae AT stevenlsalzberg firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae |
_version_ |
1721338227377307648 |