Phylogenomic blind spots: The limits of UCE and BUSCO loci in the presence of gene flow
Nicole M Foley, William J MurphyAbstract
Ultraconserved elements (UCEs) and BUSCO genes are commonly used markers in reduced-representation phylogenomic studies. They are valued for their evolutionary conservation, ease of alignment, and cost-effectiveness in generating phylogenomic datasets for non-model species. Recombination-aware phylogenomic approaches reveal that, with increased historical and recent gene flow, the species tree may be limited to genomic regions with low recombination rates, whereas introgression-associated alleles are most often found in high-recombining regions. In this study, we aimed to determine whether widely used UCE and BUSCO datasets can reliably recover the species tree in mammalian clades where extensive introgression has previously been documented using recombination-aware methods. Our analyses indicate that UCEs and BUSCO loci are not sampled randomly from the genome and are underrepresented in mammalian sex chromosomes and other low-recombining genomic regions. Concatenation and coalescent-based phylogenomic analyses across 12 clades with varying degrees of gene flow showed that UCEs and BUSCO datasets do not recover the true species topology when introgression is frequent. Although neutral loci are generally preferred for phylogenomic analyses, per-base constraint measures estimated genome-wide show that UCE and BUSCO loci originate from genomic regions under very strong selective constraint. Comparisons of branch lengths and node heights from trees based on accelerated, neutral, and conserved PhyloP datasets revealed that those derived from UCE and BUSCO data are compressed relative to trees from neutral regions. We conclude by proposing mitigation strategies to address some of the issues identified in this study, thereby improving the use of UCE-based or other target-enrichment methods in phylogenomics.