DOI: 10.1093/bioinformatics/btag394 ISSN: 1367-4803

Computational tool choice impacts CRISPR spacer-protospacer detection

Uri Neri, Antonio Pedro Camargo, Brian Bushnell, Rick Beeloo, Simon Roux

Abstract

Motivation

CRISPR spacer-protospacer matching is widely used to infer host–virus interactions in microbial and viromics studies, but the choice of sequence search or alignment tool and its reporting behavior is often under-evaluated for this specific task.

Results

Using synthetic, semi-synthetic, and real datasets, we benchmarked commonly used tools and observed substantial differences in recall, runtime, and resource usage across distance metrics and thresholds. Our analyses support practical defaults for large-scale spacer-target matching and clarify trade-offs between exhaustive and heuristic approaches.

Availability

Source code and benchmark workflows are available at https://github.com/UriNeri/spacer_matching_bench. Data and run artifacts are archived on Zenodo (https://doi.org/10.5281/zenodo.15171878).

More from our Archive