Short-read Chromosome Level Genome Assembly of Digitaria exilis

Genomics has become an important tool in agriculture. Many modern crop breeding approaches such as genomic selection and genome editing require detailed information of the genomic composition of a crop species. However, the assembly of high-quality genome sequences is prone to technical artifacts th...

Full description

Bibliographic Details
Main Author: Gapa, Liubov
Other Authors: Krattinger, Simon G.
Language:en
Published: 2019
Subjects:
Online Access:Gapa, L. (2019). Short-read Chromosome Level Genome Assembly of Digitaria exilis. KAUST Research Repository. https://doi.org/10.25781/KAUST-0NS7F
http://hdl.handle.net/10754/660202
Description
Summary:Genomics has become an important tool in agriculture. Many modern crop breeding approaches such as genomic selection and genome editing require detailed information of the genomic composition of a crop species. However, the assembly of high-quality genome sequences is prone to technical artifacts that arise from inaccuracies in the sequencing technology and assembly algorithms. This is particularly true for the genomes of cereal crops, which are often very large, repeat-rich, and polyploid. Until recently, the highly continuous assembly of such cereal crop genomes from short-read data was mainly possible with proprietary assembly tools. In this work, we combined data generated with several short-read sequencing protocols and genomics technologies, including paired-end and mate-pair reads with multiple insert sizes, 10X linked reads, Hi-C contacts, and optical maps to assemble a chromosome level reference genome of Digitaria exilis (fonio millet) with open-source tools. Fonio millet is a semi-domesticated cereal orphan crop native to West Africa that has a high potential for desert agriculture. We implemented the TRITEX pipeline - a recently developed open-source pipeline for the assembly of large Triticeae genomes. We modified the pipeline to include 10X and Hi-C reads into the assembly process independently. We then compared the TRITEX assembly to the fonio reference genome, which had previously been assembled from the same input data but using proprietary algorithms. We found the two assemblies highly similar in content with high concordance in the local order (0.91 Pearson coefficient for alignments). However, we detected many small putative discrepancies between the two assemblies. While the TRITEX assembly was able to produce a highly continuous genome assembly, further work is needed to characterize the putative discrepancies in more detail.