DOI: 10.1093/bioinformatics/btt499 ISSN:

A general species delimitation method with applications to phylogenetic placements

Jiajie Zhang, Paschalia Kapli, Pavlos Pavlidis, Alexandros Stamatakis
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Molecular Biology
  • Biochemistry
  • Statistics and Probability

Abstract

Motivation: Sequence-based methods to delimit species are central to DNA taxonomy, microbial community surveys and DNA metabarcoding studies. Current approaches either rely on simple sequence similarity thresholds (OTU-picking) or on complex and compute-intensive evolutionary models. The OTU-picking methods scale well on large datasets, but the results are highly sensitive to the similarity threshold. Coalescent-based species delimitation approaches often rely on Bayesian statistics and Markov Chain Monte Carlo sampling, and can therefore only be applied to small datasets.

Results: We introduce the Poisson tree processes (PTP) model to infer putative species boundaries on a given phylogenetic input tree. We also integrate PTP with our evolutionary placement algorithm (EPA-PTP) to count the number of species in phylogenetic placements. We compare our approaches with popular OTU-picking methods and the General Mixed Yule Coalescent (GMYC) model. For de novo species delimitation, the stand-alone PTP model generally outperforms GYMC as well as OTU-picking methods when evolutionary distances between species are small. PTP neither requires an ultrametric input tree nor a sequence similarity threshold as input. In the open reference species delimitation approach, EPA-PTP yields more accurate results than de novo species delimitation methods. Finally, EPA-PTP scales on large datasets because it relies on the parallel implementations of the EPA and RAxML, thereby allowing to delimit species in high-throughput sequencing data.

Availability and implementation: The code is freely available at www.exelixis-lab.org/software.html.

Contact:  Alexandros.Stamatakis@h-its.org

Supplementary information:  Supplementary data are available at Bioinformatics online.

More from our Archive