Genomics has become an important tool in agriculture. Many modern crop breeding approaches such as genomic selection and genome editing require detailed information of the genomic composition of a crop species. However, the assembly of high-quality genome sequences is prone to technical artifacts that arise from inaccuracies in the sequencing technology and assembly algorithms. This is particularly true for the genomes of cereal crops, which are often very large, repeat-rich, and polyploid. Until recently, the highly continuous assembly of such cereal crop genomes from short-read data was mainly possible with proprietary assembly tools. In this work, we combined data generated with several short-read sequencing protocols and genomics technologies, including paired-end and mate-pair reads with multiple insert sizes, 10X linked reads, Hi-C contacts, and optical maps to assemble a chromosome level reference genome of Digitaria exilis (fonio millet) with open-source tools. Fonio millet is a semi-domesticated cereal orphan crop native to West Africa that has a high potential for desert agriculture. We implemented the TRITEX pipeline - a recently developed open-source pipeline for the assembly of large Triticeae genomes. We modified the pipeline to include 10X and Hi-C reads into the assembly process independently. We then compared the TRITEX assembly to the fonio reference genome, which had previously been assembled from the same input data but using proprietary algorithms. We found the two assemblies highly similar in content with high concordance in the local order (0.91 Pearson coefficient for alignments). However, we detected many small putative discrepancies between the two assemblies. While the TRITEX assembly was able to produce a highly continuous genome assembly, further work is needed to characterize the putative discrepancies in more detail.
|Date made available
|KAUST Research Repository