First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)

Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissu...

Full description

Bibliographic Details
Main Authors: Victoria L. Sork, Sorel T. Fitz-Gibbon, Daniela Puiu, Marc Crepeau, Paul F. Gugger, Rachel Sherman, Kristian Stevens, Charles H. Langley, Matteo Pellegrini, Steven L. Salzberg
Format: Article
Language:English
Published: Oxford University Press 2016-11-01
Series:G3: Genes, Genomes, Genetics
Subjects:
Online Access:http://g3journal.org/lookup/doi/10.1534/g3.116.030411
id doaj-e736641342ed4419951d41e54c4cdd2b
record_format Article
spelling doaj-e736641342ed4419951d41e54c4cdd2b2021-07-02T05:46:18ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362016-11-016113485349510.1534/g3.116.0304115First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)Victoria L. SorkSorel T. Fitz-GibbonDaniela PuiuMarc CrepeauPaul F. GuggerRachel ShermanKristian StevensCharles H. LangleyMatteo PellegriniSteven L. SalzbergOak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.http://g3journal.org/lookup/doi/10.1534/g3.116.030411adaptationannotationchloroplastnuclear genome assemblyQuercusGenPredShared Data ResourcesGenomic Selection
collection DOAJ
language English
format Article
sources DOAJ
author Victoria L. Sork
Sorel T. Fitz-Gibbon
Daniela Puiu
Marc Crepeau
Paul F. Gugger
Rachel Sherman
Kristian Stevens
Charles H. Langley
Matteo Pellegrini
Steven L. Salzberg
spellingShingle Victoria L. Sork
Sorel T. Fitz-Gibbon
Daniela Puiu
Marc Crepeau
Paul F. Gugger
Rachel Sherman
Kristian Stevens
Charles H. Langley
Matteo Pellegrini
Steven L. Salzberg
First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
G3: Genes, Genomes, Genetics
adaptation
annotation
chloroplast
nuclear genome assembly
Quercus
GenPred
Shared Data Resources
Genomic Selection
author_facet Victoria L. Sork
Sorel T. Fitz-Gibbon
Daniela Puiu
Marc Crepeau
Paul F. Gugger
Rachel Sherman
Kristian Stevens
Charles H. Langley
Matteo Pellegrini
Steven L. Salzberg
author_sort Victoria L. Sork
title First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
title_short First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
title_full First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
title_fullStr First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
title_full_unstemmed First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)
title_sort first draft assembly and annotation of the genome of a california endemic oak quercus lobata née (fagaceae)
publisher Oxford University Press
series G3: Genes, Genomes, Genetics
issn 2160-1836
publishDate 2016-11-01
description Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.
topic adaptation
annotation
chloroplast
nuclear genome assembly
Quercus
GenPred
Shared Data Resources
Genomic Selection
url http://g3journal.org/lookup/doi/10.1534/g3.116.030411
work_keys_str_mv AT victorialsork firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT soreltfitzgibbon firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT danielapuiu firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT marccrepeau firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT paulfgugger firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT rachelsherman firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT kristianstevens firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT charleshlangley firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT matteopellegrini firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
AT stevenlsalzberg firstdraftassemblyandannotationofthegenomeofacaliforniaendemicoakquercuslobataneefagaceae
_version_ 1721338227377307648