Assistant Professor University of Illinois Champaign, Illinois
Inferring the tree of life with large datasets has distinct challenges. Large datasets provide evidence of relationships, but this information-content must outweigh noise. Limiting genomic matrices to only the most informative regions may result in spotty datasets with patterns of missing data that are fatal for the inclusion of many taxa. For instance, our combined dataset of cockroach transcriptomes and target-enriched genomes has most taxa with >90% missing data, and ~10% of taxa have 98% missing data. We present a phylogenetic super-tree inference of this dataset (1183 loci) designed to minimize dataset trimming and artefactual inferences. Historically, super-tree methods often fail to assign meaningful node-support values. We, however, solve this problem by mapping node support values among subtrees onto the final tree. By designing a hierarchical inference where each level of tree inference has a specific tier of dataset quality, quality-awareness is added to the support values. Using this we present a phylogeny of Blattodea with an in-depth view of support for relationships. Under this framework, a number of previous phylogenetic uncertainties may be largely solved (e.g., Mastotermes darwiniensis is the sister to all other termites/Isoptera). We also show that some clades recovered with solid support previously may be artefacts of low-quality data. This leaves questions about dark nodes more acutely clarified. As such, we show that quality-aware node support values can be a valuable starting point for a robust series of phylogenetic support tests.