Assistant Professor Brigham Young University Provo, Utah
Whole genome assembly has rapidly improved as third-generation sequencing technology like PacBio HiFi and Oxford Nanopore (ONT) have bridged the gaps of complex genomes by providing high-accuracy, long read data. The improvements in these technologies have resulted in long average read lengths ( >15 kbp) and sequence quality scores above 99% ( >Q20). They are particularly well-suited to assembling long, repetitive regions of the genome. Current assembly techniques combine reads with identical sequences to form longer, continuous sections. In repetitive regions, this process tends to condense the repeated sequences into one shorter read, instead of preserving the continuous nature of the repeats. Long reads avoid this issue by sequencing repeats together in one continuous read. Heavy chain fibroin (h-fibroin), the gene that encodes for the primary silk protein in Trichoptera and Lepidoptera, is long (often >20 kbp) and repetitive. Recent work showed that PacBio HiFi sequencing provided higher quality assemblies of h-fibroin when compared to the last generation of ONT pores (R9.4.1) and chemistry despite having a shorter average read length. Recent advances in ONT chemistry and nanopores have led to higher quality scores, perhaps allowing successful assembly of this gene region. To better understand the advances in ONT sequencing and its ability to provide high-quality, continuous genome assemblies of complex organisms, we assess the quality of a genome assembly of the h-fibroin silk gene for a Trichoptera species, Arctopsyche grandis, using the newest ONT chemistry.