DOI: 10.1111/1755-0998.70170 ISSN: 1755-098X

BeeGees: A High‐Throughput Protein‐Coding DNA Barcode Recovery Pipeline Tailored for Genome Skims of Museum Specimens

Daniel A. J. Parsons, Rutger A. Vos, Benjamin W. Price

ABSTRACT

Natural history collections are unparalleled archives of global biodiversity, yet most specimens remain molecularly uncharacterised due to the technical challenges of historical DNA (hDNA), including degradation, low endogenous content and contamination. Genome skimming offers a scalable alternative to PCR‐based barcoding, but existing bioinformatic workflows are not optimised for the heterogeneous, metagenomic nature of museum‐derived data. Here we present BeeGees (Barcode Extraction and Evaluation from Genome Skims), a high‐performance computing (HPC) integrated, Snakemake‐based workflow designed for protein‐guided recovery and validation of mitochondrial and plastid barcode genes from degraded short‐read genome sequences. BeeGees integrates dual read pre‐processing, systematic per‐sample parameter sweeps, sequential consensus cleaning to remove contaminant sequences and rigorous structural and taxonomic validation against curated reference databases. We benchmarked BeeGees on 1518 museum specimen‐derived genome skims spanning eight phyla. The workflow completed in approximately 120 h (< 5 min per sample) on HPC infrastructure. When excluding sequencing failures (< 1 M reads), validated COI barcodes were recovered for 73.2% of specimens (1050/1435). Barcode recovery success was influenced by endogenous content, preservation quality and parameter choice rather than raw read count alone, highlighting the importance of systematic parameter optimisation. Sequential consensus cleaning eliminated ambiguous bases and reduced chimeric artefacts, proving essential for robust museomic analyses. BeeGees provides a reproducible, scalable framework for high‐throughput barcode recovery and biodiversity genomics and reference gap‐filling initiatives from natural history collections. The BeeGees pipeline is available at: https://github.com/bge‐barcoding/BeeGees/ .

More from our Archive